method
active
method:mds-injectionMDS Injection
Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
Neighborhood — ranked by edge-count
Frameworks (1)
framework
- The paper's primary contribution: performs unbounded, fluency-constrained sweeps in semantically calibrated centroid units using psychological artifacts
Concepts (3)
concept
- Residual-Stream InjectionimplementsCore activation intervention: add scaled vector to residual stream at layer l during completion
- A vector in activation space aligned with a behavioral concept; core object manipulated by RepE methods
- Residual-stream activations extracted by prefilling with the statement itself under Tell me about yourself prompt; used for MDS/MDB vectors
Methods (1)
method
- PM Hybrid MethodusesHybrid method combining Personality Prompting (P2) with MDS injections; best overall steering method
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Mean-difference vectors derived from Yes/No binary-prefill activations (h_b)
- Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
- MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaksfinding0.735Contrasts with SJT results; leads authors to narrow analyses to SJT responses
- Why do MDS injections outperform other methods on the inventory (multiple-choice) task?question0.724Identified as an unexplained result and open question in limitations section
- Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75
- Parameter controlling how often an injection is applied during completion; s=1 injects on every activation, achieving strongest steering
- MDS injection steering efficiency peaks at mid-layers across LLMs, injection strides, and OCEAN traitsfinding0.703Consistent empirical pattern supporting the connection between mid-layer representations and emotion/behavioral content
- Identified exception to overall MDS effectiveness; reason remains unexplained as a limitation