method
active
method:early-forced-answeringEarly Forced Answering
Named evaluation protocol: truncating CoT at various points and forcing the model to give a final answer, to measure when the answer stabilizes
Neighborhood — ranked by edge-count
Papers (1)
paper
Frameworks (1)
framework
- The conceptual framework introduced by the paper distinguishing performative CoT from genuine reasoning using activation probing
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Arditi et al. 2024 finding that refusal behavior is mediated by one direction in LLM activations; exemplar of single-direction causal results
- An Elephant action of answering a query.
- A key ingredient of liveness where the evaluation gulf is minimized so effects of user changes are immediately visible with automatic demand of result.
- Definition of responsiveness for verification purposes.
- The percentage of harmful requests that a model refuses to answer, a common safety metric.
- Failure to adapt bidding to game phase.
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (Li et al., 2023)concept0.666Safety intervention that relies on activation modification, which ESR might undermine
- Universalist claim predicting cross-cultural generality.