finding
active
finding:binary-detection-accuracy-up-to-97-3-at-l0-5-is-entirely-explained-by-global-logit-shifts-r-0-999-correlation-with-control

Binary detection accuracy (up to 97.3% at L0 α=5) is entirely explained by global logit shifts (r=0.999 correlation with control)

Core negative result: the binary detection paradigm cannot distinguish genuine introspection from uniform output bias

Source paper

extracted_from
Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs
(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.