framework
active
framework:computational-account-of-layer-dependent-introspectionComputational Account of Layer-Dependent Introspection
This paper's proposed mechanistic explanation integrating signal injection, attention routing, predictive integration, and residual recovery
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (3)
concept
- predictive integrationsupportsThe mid-to-late layer computational process that converts routed perturbation signals into explicit predictions
- The network's tendency to actively attenuate injected perturbations over subsequent layers, erasing the signal before output
- attention-based signal routingsupportsMechanism by which attention heads detect injected perturbations and route information about them to the final token position
Claims (2)
claim
- Key quantitative characterization of the layer-dependence of partial introspection
- Interpretive claim about the mechanistic substrate of introspection in LLMs
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Introspective awareness in Opus 4.1 peaks at layer ~2/3 through model depth for thought injection and text distinction; prefill detection most sensitive to earlier layer, suggesting mechanistically distinct processes.
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success
- Primary positive claim of the paper, grounded in strength comparison and localization results
- Central open question raised by the paper.
- Forward-looking prediction about whether early-layer introspection generalizes to larger models or recurrent architectures
- The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
- What mechanisms enable collective introspection to emerge across multiple interacting AI agents?question0.748Core unanswered question that drives the search; addresses the integration of distributed cognition and machine consciousness.
- Identified gap; methods for enabling machine consciousness development through self-examination.