question
active
question:can-large-language-models-introspect-that-is-accurately-detect-perturbations-to-their-own-internal-statesCan large language models introspect—that is, accurately detect perturbations to their own internal states?
Central research question of the paper
Source paper
extracted_from(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1
Neighborhood — ranked by edge-count
Claims (1)
claim
- Primary positive claim of the paper, grounded in strength comparison and localization results
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Framing question that motivates the entire paper
- Central research question animating the paper: distinguishing genuine introspection from illusion through causal manipulation of activations.
- The primary paper being extracted — applies IIT 3.0 and 4.0 to LLM representation sequences derived from ToM test data to investigate whether consciousness phenomena can be observed.
- Central thesis statement of the paper
- Large language models develop surprisingly coherent yet often rigid internal preferences as they scalefinding0.815Mazeika et al. finding reinforcing the need for emptiness-based flexible value architectures
- Abstract's main conclusion.
- Analogy between LLM incoherence and schizophrenia symptoms
- Related work demonstrating LLM introspective capabilities with scale-dependent pattern paralleling ESR