claim

active

claim:different-forms-of-introspection-invoke-mechanistically-different-processes

Different forms of introspection invoke mechanistically different processes

Based on layer-selective perturbation results.

Source paper

extracted_from

Emergent Introspective Awareness in Large Language Models

(2026) · Lindsey, Jack

Neighborhood — ranked by edge-count

Findings (2)

finding

Introspective awareness peaks at a layer about two-thirds through Opus 4.1 for injected thoughts
supports
The success rate shows a sharp peak at a specific middle layer.
Prefill detection effect peaks at an earlier layer (slightly over halfway through) in Opus 4.1, different from injected thoughts peak
supports
The optimal layer for the prefill introspection differs from the optimal layer for detecting injected thoughts.

Communities (4)

community

Mechanistic interpretability & model evaluation
members_of
Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
Mechanistic introspection in language models
members_of
Empirical investigation of how LMs access and report internal states across layers, using concept injection and thought detection on Claude models.
LLM functional introspective awareness
members_of
Empirical probing of language models' ability to detect and report their own internal concept representations
Mechanistic introspection in language models
members_of
Investigates how different introspective processes activate distinct computational mechanisms at specific model depths, using layer-wise analysis.

Concepts (1)

concept

Introspective awareness
about
The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.

Questions (1)

question

How general are the model's introspective mechanisms? Do they have a global representation of thoughts?
gates
Question about uniformity of introspection mechanisms.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Systematic Introspective Processesconcept0.816
Identified gap; methods for enabling machine consciousness development through self-examination.
Either introspection is an emergent capability requiring larger scale, or more stringent controls are needed to test introspection in smaller modelsclaim0.807
Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
What are the mechanisms underlying introspection in language models?question0.806
Central open question raised by the paper.
We hypothesize that introspective capabilities may scale with model size and architecture, including recurrence/looping that extends the integration windowhypothesis0.797
Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
Functional and phenomenal introspection are distinguishable, and whether they correlate in machines is an open question.claim0.795
Core conceptual distinction introduced at the start; defines the paper's central problem.
Do apparent introspection results reflect genuine metacognitive access to internal representations, or do they emerge from simpler mechanisms such as output distribution shifts?question0.794
Key discriminating question motivating the baseline control experiment
Introspection relies on general-purpose computational mechanisms—attention-based anomaly detection and residual stream dynamics—rather than specialized introspection circuitsclaim0.794
Interpretive claim about the mechanistic substrate of introspection in LLMs
Introspective capacity is present from the first conversation turn, not requiring multi-turn context to emergeclaim0.789
Three of four concepts show significant introspection at turn 1; rules out joint temporal drift as sole explanation