concept
active
concept:residual-stream-recovery-dynamicsresidual stream recovery dynamics
The network's tendency to actively attenuate injected perturbations over subsequent layers, erasing the signal before output
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- This paper's proposed mechanistic explanation integrating signal injection, attention routing, predictive integration, and residual recovery
Claims (1)
claim
- Mechanistic account explaining why late-layer introspection fails, combining two independent explanatory factors
Methods (1)
method
- residual stream recovery trackingimplementsTracks cosine similarity, norm ratio, and injection direction projection across layers to measure recovery from perturbation
Concepts (1)
concept
- predictive integrationassociated_withThe mid-to-late layer computational process that converts routed perturbation signals into explicit predictions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
- The intermediate representations in transformer layers whose activations are patched and probed for truth information
- Technique to localize causally implicated hidden states by swapping residual stream activations between a true and false input and measuring downstream log-probability changes
- The finite dimensional capacity of the residual stream for storing and communicating information between layers; conceptualized as being under high demand
- Core activation intervention: add scaled vector to residual stream at layer l during completion
- Used to localize causally implicated hidden states by swapping activations between true and false inputs
- The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
- Architectural observation enabling the entire mathematical framework; the residual stream is purely a sum of linear projections