claim
active
claim:reflection-does-not-only-emerge-in-sft-or-rl-stages-but-arises-earlier-during-pre-trainingReflection does not only emerge in SFT or RL stages but arises earlier during pre-training.
Cited finding from Shah et al. contextualizing the training origins of reflection.
Source paper
extracted_from(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Author of 'Rethinking reflection in pre-training' paper introducing gsm8k_adv/cruxeval_o_adv datasets.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.800Empirical observation about which network layers encode reflection-relevant information.
- Central interpretive claim of the paper, supported by steering vector experiments.
- Interpretive claim about the locus of reflection in transformer architecture.
- First central research question motivating ReflCtrl investigation
- Introspective signals appear in middle layers but are suppressed by later post-training-shaped layers.finding0.766Mechanistic finding by Lindsey (2026) explaining how contemplative prompt may work: enables mid-layer introspection to reach output.
- Open question about developmental origin of ESR mechanisms
- Core claim of ReflCtrl that a single direction captures and controls reflection
- We hypothesize ESR may emerge from RLHF training rather than existing in pretrained representationshypothesis0.760Open question about the developmental origin of ESR mechanisms