quote

active

quote:our-findings-demonstrate-that-llms-can-compute-meaningful-functions-over-perturbations-to-their-internal-states-establishing-introspection-as-a-real-but-layer-dependent-phenomenon-that-merits-further-investigation

"Our findings demonstrate that LLMs can compute meaningful functions over perturbations to their internal states, establishing introspection as a real but layer-dependent phenomenon that merits further investigation."

Central thesis statement of the paper

Source paper

extracted_from

Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs

(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1

Neighborhood — ranked by edge-count

Claims (1)

claim

LLMs can compute meaningful functions over perturbations to their internal states, establishing introspection as a real but layer-dependent phenomenon
supports
Primary positive claim of the paper, grounded in strength comparison and localization results

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

We hypothesize that 'consciousness' phenomena can be observed in the internal states of an LLM, specifically in its learned representations when analyzed as a sequence.hypothesis0.834
Primary research hypothesis driving the entire study; operationalized via three criteria.
LLM introspection on internal computations is architecturally permitted; whether models leverage this is an empirical question.claim0.829
Core claim directly challenged by prior work denying introspection; forms foundation for Koan Battery introspection studies.
LLM representations exhibit intriguing patterns under spatio-permutational analyses, suggesting a potentially profound yet tentative indication of consciousness.claim0.826
Qualified positive claim from spatio permutation analysis where two cases satisfy all three criteria.
The systematic behavioral shift of LLMs under self-referential processing conditions predicted by consciousness theories represents something more structured than superficial correlations in training dataclaim0.824
The paper's claim that theoretical convergence across GWT, RPT, HOT, IIT makes the findings non-coincidental
So at any point in the network, the transformer not only receives information from its past... but also has causal influence over its future processing. So, saying that LLMs cannot introspect... is incorrect.quote0.823
Core summary of Janus' position on autoregressive recurrence enabling introspection.
It is plausible that ongoing developments in LLMs may lead to models or agentic systems built on LLMs capable of generating representations observed with 'consciousness' phenomena.claim0.818
Forward-looking claim suggesting the methodological framework is relevant for future AI systems beyond current LLMs.
LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledgefinding0.815
Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
Can large language models introspect—that is, accurately detect perturbations to their own internal states?question0.815
Central research question of the paper