model introspection

The capacity of a model to self-report on its internal emotional state when its SAE features are steered, used here as a measurement tool

Neighborhood — ranked by edge-count

paper

method

Agentic self-steering evaluation
implements
Method where Kimi K2.5 steers its own SAE features in real time and reports on its internal emotional state
Textual SAE feature emotionality evaluation
implements
Method where Kimi evaluates steered vs unsteered text samples from another instance to rate SAE feature emotionality (0-100)

concept

Introspection
related_to
The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

model size threshold for introspectionconcept0.846
Introspective capabilities appear only in very large models (>70B), with 70B barely on the threshold; bottleneck for independent research.
Introspection is aided by overall improvements in model intelligenceclaim0.828
Interpretation of the observation that the most capable models performed best.
What are the mechanisms underlying introspection in language models?question0.827
Central open question raised by the paper.
Introspective capabilities may continue to develop with further improvements to model capabilitiesclaim0.816
Forward-looking statement about future models.
AI Introspectionconcept0.815
Key gap identified in the literature; systematic self-examination processes for machine consciousness development.
Latent Introspectionconcept0.814
Pearson-Vogel et al.'s finding that models can detect prior concept injections; introspective signals exist in middle layers suppressed by post-training
partial introspectionconcept0.807
The authors' characterization of genuine but limited introspective capability found only in early-layer injection regimes
Phenomenal Introspectionconcept0.805
Direct introspection into phenomenal consciousness; its correlation with functional introspection is an open question.