Introspective strength

Spearman ρ measuring rank-order agreement between logit-based self-report and probe score; the paper's primary monotonic association metric

Neighborhood — ranked by edge-count

method

Logit-based self-report
uses
Primary self-report measure: probability-weighted expected value over all ten digit-token logits, yielding a continuous rating that preserves full distributional signal

concept

Causal informational coupling
implements
Operational definition of introspection: self-report covaries monotonically with probe-defined direction AND causally shifting activations shifts the report in a semantically coherent way

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Introspectionconcept0.826
The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
Introspective Accessconcept0.816
The capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection
Introspective fidelityconcept0.808
Isotonic R² measuring fraction of variance in self-report explained by probe score under monotonicity assumption; the paper's primary fidelity metric
Introspective awarenessconcept0.808
The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Introspective Exploration Componentframework0.795
The novel framework introduced in the paper: an HMM-based pain-belief signal integrated into the reward function to drive exploration
Systematic Introspective Processesconcept0.788
Identified gap; methods for enabling machine consciousness development through self-examination.
Preference Strengthconcept0.788
The problematic possibility of digital minds with superhumanly strong preferences requiring interpersonal utility comparison frameworks
partial introspectionconcept0.784
The authors' characterization of genuine but limited introspective capability found only in early-layer injection regimes