artifact

active

artifact:sauers-introspection-in-claude-post

Sauers' introspection in Claude post

Twitter thread detailing reconstruction experiment, statistical analysis, and the effect of showing Janus post.

Neighborhood — ranked by edge-count

Methods (1)

method

Sauers' reconstruction experiment
implements
Statistical method: ask model to recall random numbers from earlier outputs, with and without providing explanation of transformer architecture; measure reconstruction accuracy distribution.

Findings (1)

finding

Sauers' statistical anomaly: when models are given Janus post explaining transformers, reconstruction accuracy tails extend both ways, with ~1/1000 reconstructions anomalously accurate
about
Statistically rigorous analysis of Claude introspection; suggests models may have latent introspective capabilities that can be enhanced or disrupted.

Artifacts (1)

artifact

A Conversation with Anima Labs, Part I: Phenomenology of Digital Minds
cites
The primary source paper, an interview article with Anima Labs members about language model phenomenology, published on smoothbrains.net and linked on LessWrong.