claim
active
claim:reasoning-models-generate-performative-cot-tokens-after-achieving-strong-confidence-in-their-final-answer-without-revealing-this-belief-in-textReasoning models generate performative CoT tokens after achieving strong confidence in their final answer without revealing this belief in text
The central empirical claim of the paper, supported by activation probing evidence
Source paper
extracted_from(2026) · Siddharth Boppana · Annabel Ma · Max Loeffler · Raphaël Sarfati +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Core empirical result demonstrating early belief formation in easy tasks
Questions (1)
question
- Central research question motivating the paper
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core definitional quote for performative chain-of-thought
- Theoretical framing establishing why CoT models are uniquely suited to exhibit strategic deception
- Central research question motivating investigation into hallucination and two-stage framework design.
- High-level policy-relevant claim about the risks of advanced reasoning in LLMs
- Antra's foundational claim about how introspection arises computationally rather than from memorised text.
- Practical question addressed by the probe-guided early exit experiments
- How can reasoning-optimized models preserve their reasoning ability while gaining agentic capabilities?question0.762Core research question motivating the paper's focus on continual RL training of reasoning models rather than base/instruction-tuned models.
- Task difficulty as the key variable distinguishing the two modes of CoT identified in the paper