finding
active
finding:net-detection-signal-detection-minus-control-is-near-zero-across-all-40-layer-strength-configurations-mean-0-01-0-03-logits

Net detection signal (detection minus control) is near-zero across all 40 layer-strength configurations: mean = -0.01 ± 0.03 logits

Quantitative evidence that binary detection provides no genuine introspection signal beyond global logit shifts

Source paper

extracted_from
Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs
(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.