question

active

question:why-do-mds-injections-outperform-other-methods-on-the-inventory-multiple-choice-task

Why do MDS injections outperform other methods on the inventory (multiple-choice) task?

Identified as an unexplained result and open question in limitations section

Source paper

extracted_from

Psychological Steering of Large Language Models

(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara

Neighborhood — ranked by edge-count

Papers (1)

paper

Psychological Steering of Large Language Models
associated_with

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaksfinding0.801
Contrasts with SJT results; leads authors to narrow analyses to SJT responses
Why do MDS injections fail on gemma-3-1b-it but succeed across all other tested LLMs?question0.762
Unexplained exception identified as a limitation and open question
MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%finding0.752
Primary quantitative result overturning prior reports that prompting outperforms representation engineering
MDS injections can steer toward multiple distinct constructs in the same completion, producing strongly polarized yet smoothly connected segmentsfinding0.728
Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
Do the findings about MDS injection effectiveness generalize to base (non-instruction-tuned) language models?question0.726
Acknowledged limitation: only instruction-tuned models were studied
MDS injections align with the Linear Representation Hypothesis: target trait varies near-linearly with alpha in open-ended generationclaim0.725
Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
MDS Injectionmethod0.724
Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
gemma-3-1b-it yields only one valid MDS injection score (phi_1,A,up = 4.8) and is excluded from main analysesfinding0.718
Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation