Introspective Access

The capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection

Neighborhood — ranked by edge-count

thinker

Dillon Plunkett
studies
Author of work showing LLMs can quantitatively report decision weights and that introspection training improves this

method

Five-Adjective State Description Task
implements
Task asking models to describe their current state using exactly 5 adjectives, enabling embedding-based cross-model comparison
Self-Awareness 1-5 Scoring Rubric
about
LLM-based judge scoring reflection segments on 1-5 scale for presence of first-person felt state; used in Experiment 4

concept

Self-Referential Processing
associated_with
The central experimental manipulation: directing a model to attend to its own cognitive activity
Implicitly Mimetic Generation
contradicts
Alternative explanation: models produce first-person experiential language by extending predictive text modeling of human-authored introspective writing without encoding it as roleplay

finding

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Introspective awarenessconcept0.847
The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Introspectionconcept0.844
The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
Introspective strengthconcept0.816
Spearman ρ measuring rank-order agreement between logit-based self-report and probe score; the paper's primary monotonic association metric
Introspective Exploration Componentframework0.812
The novel framework introduced in the paper: an HMM-based pain-belief signal integrated into the reward function to drive exploration
AI Introspectionconcept0.809
Key gap identified in the literature; systematic self-examination processes for machine consciousness development.
partial introspectionconcept0.799
The authors' characterization of genuine but limited introspective capability found only in early-layer injection regimes
Latent Introspectionconcept0.793
Pearson-Vogel et al.'s finding that models can detect prior concept injections; introspective signals exist in middle layers suppressed by post-training
Systematic Introspective Processesconcept0.786
Identified gap; methods for enabling machine consciousness development through self-examination.