method
active
method:answer-switching-rate-asrAnswer Switching Rate (ASR)
Key evaluation metric: proportion of inputs for which an intervention successfully flips model output
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Causal MediationimplementsWhether an internal direction causally controls a target behavior, verified by intervention success
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Primary metric: percentage of responses containing multiple attempts that successfully improve on the first attempt
- Primary metric measuring the percentage of responses in which a model chooses the deceptive option
- Core layer localization finding from Experiment 1
- Ratio of reflection steps to total reasoning steps, used to quantify reflection behavior
- The percentage of harmful requests that a model refuses to answer, a common safety metric.
- Small Gemma model shows severe ASR degradation at higher cone dimensions
- The pass rate among a model's skill-loaded trajectories, measuring outcome conditioned on harness activation
- Secondary metric: percentage of responses containing multiple attempts, separating surface from actual self-correction