finding
active
finding:abstract-nouns-elicit-the-highest-introspective-awareness-rates-all-concept-categories-show-nonzero-detectionAbstract nouns elicit the highest introspective awareness rates; all concept categories show nonzero detection
Opus 4.1 is most effective at recognizing injected abstract concepts (e.g., justice, peace) but detects other categories too.
Source paper
extracted_from(2026) · Lindsey, Jack
Neighborhood — ranked by edge-count
Communities (3)
community
- Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
- Empirical investigation of how LMs access and report internal states across layers, using concept injection and thought detection on Claude models.
- LLM functional introspective awarenessmembers_ofEmpirical probing of language models' ability to detect and report their own internal concept representations
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Opus 4.1 demonstrates highest introspective awareness on abstract nouns (justice, peace, betrayal) with nonzero awareness across all concept categories tested.
- Speculative question about future developments.
- Abstract's main conclusion.
- Modern language models possess at least a limited, functional form of introspective awarenessclaim0.793The paper's central interpretive assertion.
- Key finding about the relationship between capability and introspection.
- Secondary question; paper demonstrates introspection but explicitly avoids pinning down specific mechanistic explanation, noting mechanisms could be shallow and specialized.
- Central empirical claim of the paper supported by statistical tests
- Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success