claim
active
claim:signal-integration-from-early-perturbation-into-an-explicit-prediction-requires-substantial-downstream-computation-spanning-layers-4-20Signal integration from early perturbation into an explicit prediction requires substantial downstream computation spanning layers 4-20
Mechanistic characterization based on logit lens analysis showing gradual accuracy rise across layers
Source paper
extracted_from(2025) · Ely Hahami · I. N. Sinha · Jain, Lavik · Kaplan, Josh +1
Neighborhood — ranked by edge-count
Findings (1)
finding
- Shows that signal integration into explicit prediction has barely begun immediately after injection
Claims (1)
claim
- Mechanistic account explaining why late-layer introspection fails, combining two independent explanatory factors
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- We hypothesize earlier-layer interventions allow more downstream computation to process and potentially correct the perturbationhypothesis0.809Post-hoc explanation for why steering at layer 33 rather than layer 50 produced better ESR behavior in Llama-3.3-70B
- Mechanistic evidence that network actively attenuates injected perturbations, explaining late-layer introspection failure
- Attribution finding suggesting the last layer directly controls reflection keyword generation
- Visual geometric evidence for the fundamental entanglement of true/false activations in harder tasks.
- Result of canonical variates analysis showing statistical dependency between internal states and external motion.
- Quantitative relationship between concept frequency and feature presence.
- Core testable hypothesis of UCCT about the nature of performance transitions under anchoring
- The middle layer residual stream features are causally implicated in multi-step reasoning.claim0.731Features for Kobe Bryant, California, Lakers participate in computing the capital answer.