claim
active
claim:different-forms-of-introspection-invoke-mechanistically-different-processesDifferent forms of introspection invoke mechanistically different processes
Based on layer-selective perturbation results.
Source paper
extracted_from(2026) · Lindsey, Jack
Neighborhood — ranked by edge-count
Findings (2)
finding
- Introspective awareness peaks at a layer about two-thirds through Opus 4.1 for injected thoughtssupportsThe success rate shows a sharp peak at a specific middle layer.
- The optimal layer for the prefill introspection differs from the optimal layer for detecting injected thoughts.
Communities (4)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Empirical investigation of how LMs access and report internal states across layers, using concept injection and thought detection on Claude models.
- LLM functional introspective awarenessmembers_ofEmpirical probing of language models' ability to detect and report their own internal concept representations
- Investigates how different introspective processes activate distinct computational mechanisms at specific model depths, using layer-wise analysis.
Concepts (1)
concept
- The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Questions (1)
question
- Question about uniformity of introspection mechanisms.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Identified gap; methods for enabling machine consciousness development through self-examination.
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
- Central open question raised by the paper.
- Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
- Core conceptual distinction introduced at the start; defines the paper's central problem.
- Key discriminating question motivating the baseline control experiment
- Interpretive claim about the mechanistic substrate of introspection in LLMs
- Three of four concepts show significant introspection at turn 1; rules out joint temporal drift as sole explanation