claim
active
claim:model-s-uncertainty-information-is-encoded-in-the-reflection-directionModel's uncertainty information is encoded in the reflection direction
Interpretive claim from probing experiment showing reflection direction features outperform baseline for uncertainty prediction
Source paper
extracted_from(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Supports claim that uncertainty is encoded in reflection direction
Concepts (1)
concept
- Uncertainty-aware dynamic steeringassociated_withProposed future direction: model dynamically adjusts steering strength based on internal uncertainty during inference
Methods (1)
method
- Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Motivating hypothesis for Section 5's investigation of prompt template effects.
- Core question motivating interchange intervention and interpretability research supported by pyvene
- Research question motivating Section 5.
- What if the concept being manipulated does not lie on a straight line in the model's representations?question0.756The motivating question that opens the paper and leads to the development of manifold steering.
- Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.752Empirical observation about which network layers encode reflection-relevant information.
- Promising future research direction about the internal mechanism of error detection.
- The model tends to reflect more when the question is difficult, and accuracy is generally lower for harder questionshypothesis0.750Hypothesis explaining negative correlation between reflection rate and accuracy without implying reflection is harmful
- Does instructing the model to assess correctness affect the geometry of truth directions?question0.749One of the three guiding research questions of the paper.