finding
active
finding:mds-injections-outperform-p2-in-open-ended-generation-in-11-of-14-llms-with-phi-gains-of-3-61-to-16-44MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%
Primary quantitative result overturning prior reports that prompting outperforms representation engineering
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Neighborhood — ranked by edge-count
Papers (1)
paper
Thinkers (1)
thinker
- Amin BanayeeanzadecontradictsLead author of prior work reporting P2 outperforms MD injections; the paper overturns this result
Claims (2)
claim
- Uncalibrated sweep units and restricted coefficient ranges are the primary cause of prior reports showing P2 outperforming MD injectionsassociated_withsupportsMechanistic explanation for discrepancy with Banayeeanzade et al.; addressed by centroid unit and unbounded sweep contributions
- RepE is a new frontier in open-ended psychological steering of LLMs, outperforming prompting when properly calibratedassociated_withsupportsCentral interpretive claim overturning prior reports; supported by 11-of-14 LLM wins for MDS over P2
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Key finding showing that combining prompting and injection is the strongest approach
- MDS achieves global win proportion of 89.5% on SJTs across 14 LLMs and four injection stridesfinding0.806MDS dominates in open-ended generation by global win proportion metric (Table 2)
- MDS is also the top method on the inventory task but with much smaller margin than on SJTs (Table 2)
- Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
- MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaksfinding0.773Contrasts with SJT results; leads authors to narrow analyses to SJT responses
- Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation
- MDS injection steering efficiency peaks at mid-layers across LLMs, injection strides, and OCEAN traitsfinding0.752Consistent empirical pattern supporting the connection between mid-layer representations and emotion/behavioral content
- Why do MDS injections outperform other methods on the inventory (multiple-choice) task?question0.752Identified as an unexplained result and open question in limitations section