Early Forced Answering

Named evaluation protocol: truncating CoT at various points and forcing the model to give a final answer, to measure when the answer stabilizes

Neighborhood — ranked by edge-count

paper

framework

Reasoning Theater Framework
uses
The conceptual framework introduced by the paper distinguishing performative CoT from genuine reasoning using activation probing

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Refusal directionconcept0.679
Arditi et al. 2024 finding that refusal behavior is mediated by one direction in LLM activations; exemplar of single-direction causal results
answer.querymethod0.678
An Elephant action of answering a query.
Immediate Feedbackconcept0.674
A key ingredient of liveness where the evaluation gulf is minimized so effects of user changes are immediately visible with automatic demand of result.
An answer is responsive provided the questioner will know the answer to the question after he receives it.claim0.673
Definition of responsiveness for verification purposes.
refusal rateconcept0.668
The percentage of harmful requests that a model refuses to answer, a common safety metric.
G2.5-FL bid aggressiveness 2.07 early and 2.08 late (no adaptation)finding0.666
Failure to adapt bidding to game phase.
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., 2023)concept0.666
Safety intervention that relies on activation modification, which ESR might undermine
The results are more widely applicable; similar results will come from asking people in other cultures to answer analogous questions.claim0.665
Universalist claim predicting cross-cultural generality.