artifact
active
artifact:sauers-introspection-in-claude-post

Sauers' introspection in Claude post

Twitter thread detailing reconstruction experiment, statistical analysis, and the effect of showing Janus post.

Neighborhood — ranked by edge-count

Methods (1)

method
  • Statistical method: ask model to recall random numbers from earlier outputs, with and without providing explanation of transformer architecture; measure reconstruction accuracy distribution.

Findings (1)

finding

Artifacts (1)

artifact