framework
active
framework:quantitative-introspection-frameworkQuantitative Introspection Framework
The paper's central contribution: treating LLM numeric self-report as a quantitative signal validated against probe-defined internal states with causal confirmation via steering
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (6)
method
- Causal intervention technique: edit NLA explanation, reconstruct via AR, use difference as steering vector to manipulate model behavior.
- Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts
- Primary self-report measure: probability-weighted expected value over all ten digit-token logits, yielding a continuous rating that preserves full distributional signal
- Benjamini-Hochberg (BH) correctionimplementsApplied within concept/endpoint families to control false discovery rate across parallel tests
- Cluster bootstrap confidence intervalsimplementsBootstrap resampling at conversation level (B=1000, 95% percentile CIs) to respect non-independence of within-conversation observations
- Linear mixed-effects models (LMMs)implementsPrimary statistical model with random intercept by conversation, REML estimation, for pooled conversation-turn observations
Concepts (1)
concept
- Causal informational couplingimplementsOperational definition of introspection: self-report covaries monotonically with probe-defined direction AND causally shifting activations shifts the report in a semantically coherent way
Artifacts (1)
artifact
- Open-source Python library released with the paper supporting probe training, multi-probe scoring, activation steering, and logit extraction
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
- Tracking of functional/computational cognitive states, distinguished from phenomenal introspection.
- Identified gap; methods for enabling machine consciousness development through self-examination.
- Direct introspection into phenomenal consciousness; its correlation with functional introspection is an open question.
- Key gap identified in the literature; systematic self-examination processes for machine consciousness development.
- Prior framework claiming frontier LLMs can detect and name injected concepts, interpreted as nascent self-awareness
- The capacity of a model to self-report on its internal emotional state when its SAE features are steered, used here as a measurement tool
- Spearman ρ measuring rank-order agreement between logit-based self-report and probe score; the paper's primary monotonic association metric