finding
active
finding:mds-injections-show-no-salient-patterns-in-mpi-120-inventory-responses-beyond-occasional-co-occurring-peaksMDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaks
Contrasts with SJT results; leads authors to narrow analyses to SJT responses
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Why do MDS injections outperform other methods on the inventory (multiple-choice) task?question0.801Identified as an unexplained result and open question in limitations section
- Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
- Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
- Mechanistic explanation for discrepancy with Banayeeanzade et al.; addressed by centroid unit and unbounded sweep contributions
- MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%finding0.773Primary quantitative result overturning prior reports that prompting outperforms representation engineering
- Do the findings about MDS injection effectiveness generalize to base (non-instruction-tuned) language models?question0.765Acknowledged limitation: only instruction-tuned models were studied
- MDS injection steering efficiency peaks at mid-layers across LLMs, injection strides, and OCEAN traitsfinding0.758Consistent empirical pattern supporting the connection between mid-layer representations and emotion/behavioral content
- Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation