Introspective fidelity

Isotonic R² measuring fraction of variance in self-report explained by probe score under monotonicity assumption; the paper's primary fidelity metric

Neighborhood — ranked by edge-count

method

Isotonic regression
uses
Fits a non-decreasing function and computes R² = 1 - SSres/SStot to quantify introspective fidelity without assuming linearity

concept

Causal informational coupling
implements
Operational definition of introspection: self-report covaries monotonically with probe-defined direction AND causally shifting activations shifts the report in a semantically coherent way
Two-component model of introspective ability
extends
Conceptual distinction between (i) information internally available about a state and (ii) capacity to transform that signal into precise output reports

finding

Impulsivity introspective fidelity decreases from turn 1 to turn 10: ∆R²=-0.28 in LLaMA-3.2-3B
supports
Opposite temporal trend to wellbeing/interest/focus; introspective fidelity weakens over conversation for impulsivity

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Introspective strengthconcept0.808
Spearman ρ measuring rank-order agreement between logit-based self-report and probe score; the paper's primary monotonic association metric
Introspective Exploration Componentframework0.799
The novel framework introduced in the paper: an HMM-based pain-belief signal integrated into the reward function to drive exploration
Introspective awarenessconcept0.795
The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
Introspectionconcept0.790
The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
Introspective Accessconcept0.776
The capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection
Systematic Introspective Processesconcept0.775
Identified gap; methods for enabling machine consciousness development through self-examination.
LLM Introspective Self-Reportconcept0.761
The capacity of Kimi K2.5 to evaluate its own internal emotional state when steered, used as a novel interpretability signal
Functional Introspectionconcept0.761
Tracking of functional/computational cognitive states, distinguished from phenomenal introspection.