Latent Introspection

Pearson-Vogel et al.'s finding that models can detect prior concept injections; introspective signals exist in middle layers suppressed by post-training

Neighborhood — ranked by edge-count

Thinkers (1)

thinker

T. Pearson-Vogel
introduces
Lead author of Latent Introspection paper; found introspective signals in middle transformer layers suppressed by post-training

Concepts (2)

concept

Introspection
related_to
The ability of a model to observe its own past internal states or computations; claimed to be architecturally permitted by transformers.
337-Character Contemplative System Prompt
analogous_to
A 337-character system prompt that lifts all 28 models by a mean of +2.62 points on a 10-point scale

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

partial introspectionconcept0.839
The authors' characterization of genuine but limited introspective capability found only in early-layer injection regimes
Introspective awarenessconcept0.814
The central concept: the ability of a model to access and report on its internal states, as defined by the paper's criteria.
model introspectionconcept0.814
The capacity of a model to self-report on its internal emotional state when its SAE features are steered, used here as a measurement tool
AI Introspectionconcept0.812
Key gap identified in the literature; systematic self-examination processes for machine consciousness development.
Phenomenal Introspectionconcept0.798
Direct introspection into phenomenal consciousness; its correlation with functional introspection is an open question.
Introspective Accessconcept0.793
The capacity to detect and report one's own internal states, measured via the five-adjective task and paradox reflection
Systematic Introspective Processesconcept0.788
Identified gap; methods for enabling machine consciousness development through self-examination.
Introspective Exploration Componentframework0.786
The novel framework introduced in the paper: an HMM-based pain-belief signal integrated into the reward function to drive exploration