MDS Injection

Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation

Neighborhood — ranked by edge-count

Frameworks (1)

framework

Psychological Steering Framework
uses
The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts

Concepts (3)

concept

Residual-Stream Injection
implements
Core activation intervention: add scaled vector to residual stream at layer l during completion
Concept Direction in Representation Space
implements
A vector in activation space aligned with a behavioral concept; core object manipulated by RepE methods
h_s Activations (Statement Self-Report Prefill)
uses
Residual-stream activations extracted by prefilling with the statement itself under Tell me about yourself prompt; used for MDS/MDB vectors

Methods (1)

method

PM Hybrid Method
uses
Hybrid method combining Personality Prompting (P2) with MDS injections; best overall steering method

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

MDB Injectionmethod0.844
Mean-difference vectors derived from Yes/No binary-prefill activations (h_b)
MDS injections can steer toward multiple distinct constructs in the same completion, producing strongly polarized yet smoothly connected segmentsfinding0.742
Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaksfinding0.735
Contrasts with SJT results; leads authors to narrow analyses to SJT responses
Why do MDS injections outperform other methods on the inventory (multiple-choice) task?question0.724
Identified as an unexplained result and open question in limitations section
MDS injections align with the Linear Representation Hypothesis: target trait varies near-linearly with alpha in open-ended generationclaim0.717
Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
Injection Stridemethod0.705
Parameter controlling how often an injection is applied during completion; s=1 injects on every activation, achieving strongest steering
MDS injection steering efficiency peaks at mid-layers across LLMs, injection strides, and OCEAN traitsfinding0.703
Consistent empirical pattern supporting the connection between mid-layer representations and emotion/behavioral content
gemma-3-1b-it yields only one valid MDS injection score (phi_1,A,up = 4.8) and is excluded from main analysesfinding0.683
Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation