artifact
active
artifact:sauers-introspection-in-claude-postSauers' introspection in Claude post
Twitter thread detailing reconstruction experiment, statistical analysis, and the effect of showing Janus post.
Neighborhood — ranked by edge-count
Methods (1)
method
- Sauers' reconstruction experimentimplementsStatistical method: ask model to recall random numbers from earlier outputs, with and without providing explanation of transformer architecture; measure reconstruction accuracy distribution.
Findings (1)
finding
- Statistically rigorous analysis of Claude introspection; suggests models may have latent introspective capabilities that can be enhanced or disrupted.
Artifacts (1)
artifact
- The primary source paper, an interview article with Anima Labs members about language model phenomenology, published on smoothbrains.net and linked on LessWrong.