finding

active

finding:abstract-nouns-elicit-the-highest-introspective-awareness-rates-all-concept-categories-show-nonzero-detection

Abstract nouns elicit the highest introspective awareness rates; all concept categories show nonzero detection

Opus 4.1 is most effective at recognizing injected abstract concepts (e.g., justice, peace) but detects other categories too.

Source paper

extracted_from

Emergent Introspective Awareness in Large Language Models

(2026) · Lindsey, Jack

Neighborhood — ranked by edge-count

Communities (3)

community

Mechanistic interpretability & model evaluation
members_of
Spans attention head decomposition, benchmark awareness, and genomic pathogenicity prediction via neural models.
Mechanistic introspection in language models
members_of
Empirical investigation of how LMs access and report internal states across layers, using concept injection and thought detection on Claude models.
LLM functional introspective awareness
members_of
Empirical probing of language models' ability to detect and report their own internal concept representations

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Models more effective at recognizing abstract nouns than other concept typesfinding0.835
Opus 4.1 demonstrates highest introspective awareness on abstract nouns (justice, peace, betrayal) with nonzero awareness across all concept categories tested.
Will introspective awareness become more reliable in future AI models?question0.796
Speculative question about future developments.
Our results demonstrate that modern language models possess at least a limited, functional form of introspective awareness.quote0.796
Abstract's main conclusion.
Modern language models possess at least a limited, functional form of introspective awarenessclaim0.793
The paper's central interpretive assertion.
Notably, Claude Opus 4.1 and 4—the most recently released and most capable models of those that we test—perform the best in our experiments, suggesting that introspective capabilities may emerge alongside other improvements to language models.quote0.790
Key finding about the relationship between capability and introspection.
What are the mechanistic bases of introspective awareness in LLMs?question0.787
Secondary question; paper demonstrates introspection but explicitly avoids pinning down specific mechanistic explanation, noting mechanisms could be shallow and specialized.
Introspective agents generally outperform standard no-pain baseline agents across environments and reward categoriesclaim0.785
Central empirical claim of the paper supported by statistical tests
Either introspection is an emergent capability requiring larger scale, or more stringent controls are needed to test introspection in smaller modelsclaim0.785
Alternative interpretations offered for why binary detection fails in Llama 3.1 8B but frontier models claim success