question
active
question:why-do-mds-injections-fail-on-gemma-3-1b-it-but-succeed-across-all-other-tested-llmsWhy do MDS injections fail on gemma-3-1b-it but succeed across all other tested LLMs?
Unexplained exception identified as a limitation and open question
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Neighborhood — ranked by edge-count
Papers (1)
paper
- Psychological Steering of Large Language Modelsassociated_with
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation
- Why do MDS injections outperform other methods on the inventory (multiple-choice) task?question0.762Identified as an unexplained result and open question in limitations section
- MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaksfinding0.731Contrasts with SJT results; leads authors to narrow analyses to SJT responses
- MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%finding0.727Primary quantitative result overturning prior reports that prompting outperforms representation engineering
- MDS injection steering efficiency peaks at mid-layers across LLMs, injection strides, and OCEAN traitsfinding0.720Consistent empirical pattern supporting the connection between mid-layer representations and emotion/behavioral content
- Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
- Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
- Proof-of-principle that MAS can detect model misalignment in DeepSeek-R1-Qwen-1.5B fine-tuned models.