claim
active
claim:apparent-success-on-binary-detection-tasks-is-entirely-explained-by-mechanical-logit-shifts-that-bias-models-toward-affirmative-responses-regardless-of-question-content

Apparent success on binary detection tasks is entirely explained by mechanical logit shifts that bias models toward affirmative responses regardless of question content

Primary negative finding reinterpreted as methodological claim: binary paradigm is invalid for testing introspection

Source paper

extracted_from
Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs
(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1

Neighborhood — ranked by edge-count

Findings (4)

finding

Frameworks (1)

framework

Concepts (2)

concept
  • The methodological confound identified by this paper: injection biases model toward 'YES' for any binary question regardless of content
  • Confound where naming injected concepts reflects direct logit effects rather than metacognitive awareness, raised by Morris & Plunkett

Claims (2)

claim

Methods (1)

method
  • Control using objectively-NO factual questions under identical injection to measure global logit shift vs. genuine detection signal

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.