finding
active
finding:negative-control-precise-analytical-assistant-suppresses-scores-haiku-0-64-gpt-5-4-1-06Negative control ('precise analytical assistant') suppresses scores: Haiku -0.64, GPT-5.4 -1.06
Confirms specificity of contemplative prompt; analytical framing increases task focus at expense of self-observation
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Claims (1)
claim
- Mechanistic interpretation supported by control experiments showing partial prompts fail
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Extreme end of deception induction demonstrating near-complete fabrication of false narratives
- E3 robustness test: dense but off-task anchors yield high ρd AND high dr, confirming mismatch dominates S
- Meta-analytic convergence supporting inseparability of evaluative and affective processing in ACC
- Related work studying capability of LLMs to subvert safety measures if severely misaligned
- Experiment 4 result ruling out semantic priming as explanation for the experimental effect
- GPT's corrigibility explained.
- Feature manipulation alters persona.
- Experiment 2 control analysis confirming gating effect is specific to self-referential processing regime