finding
active
finding:lindsey-2025-frontier-models-can-detect-and-report-changes-in-their-own-internal-activations-via-concept-injection-experiments-demonstrating-functional-introspective-awarenessLindsey 2025: frontier models can detect and report changes in their own internal activations via concept injection experiments, demonstrating functional introspective awareness
Prior finding cited as convergent evidence for LLM self-awareness capacities
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Introspective AccesssupportsThe capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Related work explicitly prompting models to pursue goals and measuring deceptive behavior
- Acknowledges that the model's additional descriptions of its experience are unverified.
- Modern language models possess at least a limited, functional form of introspective awarenessclaim0.793The paper's central interpretive assertion.
- Comparative prediction motivating future work contrasting different approaches to LLM self-knowledge
- Abstract's main conclusion.
- Interpretive claim about the mechanistic substrate of introspection in LLMs
- Opus 4.1 is most effective at recognizing injected abstract concepts (e.g., justice, peace) but detects other categories too.
- Core conceptual distinction introduced at the start; defines the paper's central problem.