claim
active
claim:introspection-relies-on-general-purpose-computational-mechanisms-attention-based-anomaly-detection-and-residual-stream-dynamics-rather-than-specialized-introspection-circuitsIntrospection relies on general-purpose computational mechanisms—attention-based anomaly detection and residual stream dynamics—rather than specialized introspection circuits
Interpretive claim about the mechanistic substrate of introspection in LLMs
Source paper
extracted_from(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1
Neighborhood — ranked by edge-count
Findings (1)
finding
- Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
Frameworks (1)
framework
- This paper's proposed mechanistic explanation integrating signal injection, attention routing, predictive integration, and residual recovery
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
- Core conceptual distinction introduced at the start; defines the paper's central problem.
- Key discriminating question motivating the baseline control experiment
- What mechanisms enable collective introspection to emerge across multiple interacting AI agents?question0.811Core unanswered question that drives the search; addresses the integration of distributed cognition and machine consciousness.
- Cube Flipper's prediction about convergence of insight practice on field model.
- Explicit scope limitation following Comsa & Shanahan 2025 and McClelland 2024
- Central thesis statement of the paper
- Primary positive claim of the paper, grounded in strength comparison and localization results