finding

active

finding:mds-injections-show-no-salient-patterns-in-mpi-120-inventory-responses-beyond-occasional-co-occurring-peaks

MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaks

Contrasts with SJT results; leads authors to narrow analyses to SJT responses

Source paper

extracted_from

Psychological Steering of Large Language Models

(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Why do MDS injections outperform other methods on the inventory (multiple-choice) task?question0.801
Identified as an unexplained result and open question in limitations section
MDS injections align with the Linear Representation Hypothesis: target trait varies near-linearly with alpha in open-ended generationclaim0.790
Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
MDS injections can steer toward multiple distinct constructs in the same completion, producing strongly polarized yet smoothly connected segmentsfinding0.779
Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
Uncalibrated sweep units and restricted coefficient ranges are the primary cause of prior reports showing P2 outperforming MD injectionsclaim0.776
Mechanistic explanation for discrepancy with Banayeeanzade et al.; addressed by centroid unit and unbounded sweep contributions
MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%finding0.773
Primary quantitative result overturning prior reports that prompting outperforms representation engineering
Do the findings about MDS injection effectiveness generalize to base (non-instruction-tuned) language models?question0.765
Acknowledged limitation: only instruction-tuned models were studied
MDS injection steering efficiency peaks at mid-layers across LLMs, injection strides, and OCEAN traitsfinding0.758
Consistent empirical pattern supporting the connection between mid-layer representations and emotion/behavioral content
gemma-3-1b-it yields only one valid MDS injection score (phi_1,A,up = 4.8) and is excluded from main analysesfinding0.749
Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation