question
active
question:do-apparent-introspection-results-reflect-genuine-metacognitive-access-to-internal-representations-or-do-they-emerge-from-simpler-mechanisms-such-as-output-distribution-shifts

Do apparent introspection results reflect genuine metacognitive access to internal representations, or do they emerge from simpler mechanisms such as output distribution shifts?

Key discriminating question motivating the baseline control experiment

Source paper

extracted_from
Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs
(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.