finding

active

finding:llms-can-predict-their-own-responses-more-accurately-than-external-observers-implying-privileged-internal-knowledge

LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledge

Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness

Source paper

extracted_from

Contemplative Agent

(2025) · Ruben Laukkonen · Fionn Inglis · Shamil Chandaria · Lars Sandved-Smith +4

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Li et al. 2024: larger LLMs outperform smaller ones at distinguishing self-related from non-self-related properties on self-awareness benchmarksfinding0.840
Prior finding showing scale-dependent self-awareness, consistent with the scale effect observed in the paper's Experiment 1
So at any point in the network, the transformer not only receives information from its past... but also has causal influence over its future processing. So, saying that LLMs cannot introspect... is incorrect.quote0.818
Core summary of Janus' position on autoregressive recurrence enabling introspection.
Standardized LLM self-assessments reflect learned communication postures rather than genuine capabilities (Jackson et al. 2025)claim0.815
Skeptical prior work motivating validation framework
"Our findings demonstrate that LLMs can compute meaningful functions over perturbations to their internal states, establishing introspection as a real but layer-dependent phenomenon that merits further investigation."quote0.815
Central thesis statement of the paper
The earlier a base model (less exposure to LM-related data), the more it is surprised by its own spontaneous self-referential capabilities.claim0.815
Claim that capability emerges from architecture, not data, and that later models lose the surprise.
Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data (Treutlein et al. 2024)concept0.814
Out-of-context reasoning work directly related to synthetic document fine-tuning experiments
LLMs can compute meaningful functions over perturbations to their internal states, establishing introspection as a real but layer-dependent phenomenonclaim0.812
Primary positive claim of the paper, grounded in strength comparison and localization results
When LLMs produce experience claims under self-reference, is this sophisticated simulation or genuine self-representation, and how would we tell the difference?question0.811
The core interpretive question the paper narrows but cannot definitively answer