claim

active

claim:reflection-does-not-only-emerge-in-sft-or-rl-stages-but-arises-earlier-during-pre-training

Reflection does not only emerge in SFT or RL stages but arises earlier during pre-training.

Cited finding from Shah et al. contextualizing the training origins of reflection.

Source paper

extracted_from

Unveiling the Latent Directions of Reflection in Large Language Models

(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan

Neighborhood — ranked by edge-count

Papers (1)

paper

Unveiling the Latent Directions of Reflection in Large Language Models
mentions

Thinkers (1)

thinker

Darsh J Shah (Essential AI)
cites
Author of 'Rethinking reflection in pre-training' paper introducing gsm8k_adv/cruxeval_o_adv datasets.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Reflection-inducing directions emerge more clearly in higher layers (ℓ>5) for both models and datasetsfinding0.800
Empirical observation about which network layers encode reflection-relevant information.
Reflection is not merely a behavioral artifact of prompting but a phenomenon encoded in the model's activation space.claim0.783
Central interpretive claim of the paper, supported by steering vector experiments.
Reflective reasoning requires late-stage integration of semantic and reasoning signals, hence reflection-related directions emerge more clearly in higher network layers.claim0.779
Interpretive claim about the locus of reflection in transformer architecture.
When does the model initiate reflection during its reasoning process?question0.771
First central research question motivating ReflCtrl investigation
Introspective signals appear in middle layers but are suppressed by later post-training-shaped layers.finding0.766
Mechanistic finding by Lindsey (2026) explaining how contemplative prompt may work: enables mid-layer introspection to reach output.
Does ESR emerge from RLHF or does it exist in pretrained representations?question0.764
Open question about developmental origin of ESR mechanisms
A linear reflection direction exists in reasoning LLMs' latent representation space that governs self-reflection behaviorclaim0.764
Core claim of ReflCtrl that a single direction captures and controls reflection
We hypothesize ESR may emerge from RLHF training rather than existing in pretrained representationshypothesis0.760
Open question about the developmental origin of ESR mechanisms