hypothesis
active
hypothesis:we-hypothesize-earlier-layer-interventions-allow-more-downstream-computation-to-process-and-potentially-correct-the-perturbationWe hypothesize earlier-layer interventions allow more downstream computation to process and potentially correct the perturbation
Post-hoc explanation for why steering at layer 33 rather than layer 50 produced better ESR behavior in Llama-3.3-70B
Source paper
extracted_from(2026) · Alex McKenzie · Keenan Pepper · Stijn Servaes · Martin Leitgab +5
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Endogenous Steering Resistanceassociated_withThe central phenomenon introduced by this paper: inference-time recovery from irrelevant activation steering in LLMs
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Mechanistic characterization based on logit lens analysis showing gradual accuracy rise across layers
- Performance is best when skipping both the first and last six layers when applying interventionclaim0.788Empirical configuration finding from ablation study on layer selection
- What is the full computational pathway underlying self-correction across multiple layers?question0.787Mechanistic question requiring multi-layer SAE analysis beyond current single-layer approach
- The middle layer residual stream features are causally implicated in multi-step reasoning.claim0.763Features for Kobe Bryant, California, Lakers participate in computing the capital answer.
- Central thesis statement of the paper
- Primary positive claim of the paper, grounded in strength comparison and localization results
- Cube Flipper's prediction about convergence of insight practice on field model.
- Mechanistic account explaining why late-layer introspection fails, combining two independent explanatory factors