hypothesis

active

hypothesis:reasoning-llms-trigger-reflection-when-their-internal-uncertainty-is-high

Reasoning LLMs trigger reflection when their internal uncertainty is high

Core hypothesis linking internal uncertainty to self-reflection behavior, tested via probing experiments

Source paper

extracted_from

ReflCtrl: Controlling LLM Reflection via Representation Engineering

(2025) · Ge Yan · Sun, Chung-En · Tsui-Wei · Weng

Neighborhood — ranked by edge-count

Findings (1)

finding

Reflection direction features achieve AUROC 0.772 vs. 0.736 for final layer baseline on deepseek-llama-8b on GSM8k correctness prediction
supports
Supports claim that uncertainty is encoded in reflection direction

Concepts (2)

concept

Self-reflection
associated_with
The ability of reasoning LLMs to review and revise previous reasoning steps during inference
Internal uncertainty
associated_with
The model's internal representation of uncertainty hypothesized to trigger self-reflection

Questions (1)

question

When does the model initiate reflection during its reasoning process?
gates
First central research question motivating ReflCtrl investigation

Methods (1)

method

Logistic regression correctness probe
supports
Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

what is the underlying mechanism of self-reflection in reasoning LLMs?question0.853
Open question motivating the entire paper; identified as not yet well understood
The underlying mechanism of self-reflection in reasoning LLMs is not yet well understoodquestion0.853
Broad gap motivating the entire paper
A linear reflection direction exists in reasoning LLMs' latent representation space that governs self-reflection behaviorclaim0.803
Core claim of ReflCtrl that a single direction captures and controls reflection
Reflection in LLMsconcept0.801
The core phenomenon studied: the ability of LLMs to evaluate and revise their own reasoning.
Self-reflection consumes 25-30% of total reasoning tokens in reasoning LLMsclaim0.790
Empirical observation motivating the need to control reflection for inference efficiency
LLMs can predict their own responses more accurately than external observers, implying privileged internal knowledgefinding0.790
Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsclaim0.785
Establishes that the observed linear structure is not merely a representation of text probability
Current LLMs cannot faithfully represent transformative experiences with epistemically opaque outcomes.claim0.778