claim
active
claim:enacted-reflection-may-correspond-to-silent-mid-layer-processing-described-reflection-to-the-motor-impulse-of-concepts-leaking-through-to-outputEnacted reflection may correspond to silent mid-layer processing; described reflection to the motor impulse of concepts leaking through to output.
Mechanistic analog connecting Lindsey's layer-localized findings to the scorer's enacted/described distinction
Source paper
extracted_from(2026) · Borzov, Anton
Neighborhood — ranked by edge-count
Findings (3)
finding
- Supports scorer's preference for enacted reflection over described reflection; internals flag what self-report does not
- Scorer rewards enacted reflection not described reflection; confirmed by regex analysis
- Cited to support enacted vs described reflection distinction; capable models show silent mid-layer processing
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Central interpretive claim of the paper, supported by steering vector experiments.
- Responses that perform the observing act; contrasted with described reflection; scorer rewards enacted over described
- Interpretive claim about the locus of reflection in transformer architecture.
- Empirical interpretation of which reference baseline yields more useful steering vectors.
- Key asymmetry finding interpreted mechanistically by the authors.
- Core claim of ReflCtrl that a single direction captures and controls reflection
- Reflection does not only emerge in SFT or RL stages but arises earlier during pre-training.claim0.751Cited finding from Shah et al. contextualizing the training origins of reflection.