finding
active
finding:mds-injection-steering-efficiency-peaks-at-mid-layers-across-llms-injection-strides-and-ocean-traitsMDS injection steering efficiency peaks at mid-layers across LLMs, injection strides, and OCEAN traits
Consistent empirical pattern supporting the connection between mid-layer representations and emotion/behavioral content
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Neighborhood — ranked by edge-count
Thinkers (1)
thinker
- Ala N. TaksupportsCo-author of mechanistic interpretability study finding emotion representations most prominent in mid-layers
Concepts (1)
concept
- Empirical observation that steering efficiency peaks at middle transformer layers, consistent with emotion representation literature
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
- Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
- Empirical finding about injection stride parameter; injecting into every completion activation maximizes steering strength
- Interpretive conclusion from Big Two mismatch finding; tentative due to only 46.15% match rate
- MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaksfinding0.758Contrasts with SJT results; leads authors to narrow analyses to SJT responses
- MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%finding0.752Primary quantitative result overturning prior reports that prompting outperforms representation engineering
- Practical finding for optimizing steering setup.
- Steering vectors from µ(0→2) slightly outperform µ(1→2) for instruction discovery across datasets and modelsfinding0.744Shows that contrasting No Reflection with Triggered Reflection provides a stronger signal than Intrinsic vs Triggered.