finding
active
finding:approximately-half-of-the-26-otd-latents-show-near-zero-or-negative-effect-sizes-activating-more-during-on-topic-content

Approximately half of the 26 OTD latents show near-zero or negative effect sizes, activating more during on-topic content

Reveals that contrastive search yields a heterogeneous set, not all functioning as true off-topic detectors

Source paper

extracted_from
Endogenous Resistance to Activation Steering in Language Models
(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5

Neighborhood — ranked by edge-count

Claims (1)

claim

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.