finding

active

finding:pm-hybrid-outperforms-both-p2-and-mds-in-13-of-14-llms-with-phi-gains-over-p2-from-5-56-to-21-92-and-over-mds-from-3-30-to-26-67

PM hybrid outperforms both P2 and MDS in 13 of 14 LLMs with Phi gains over P2 from 5.56% to 21.92% and over MDS from 3.30% to 26.67%

Key finding showing that combining prompting and injection is the strongest approach

Source paper

extracted_from

Psychological Steering of Large Language Models

(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara

Neighborhood — ranked by edge-count

Claims (1)

claim

Representation engineering and prompting methods may combine to achieve stronger behavioral expression across other domains
supports
Broader implication of PM hybrid's superior performance; extrapolated from OCEAN results

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%finding0.879
Primary quantitative result overturning prior reports that prompting outperforms representation engineering
MDS achieves global win proportion of 47.3% on MPI-120 inventory across 14 LLMsfinding0.770
MDS is also the top method on the inventory task but with much smaller margin than on SJTs (Table 2)
MDS achieves global win proportion of 89.5% on SJTs across 14 LLMs and four injection stridesfinding0.768
MDS dominates in open-ended generation by global win proportion metric (Table 2)
PM achieves overall SJT steerability Phi=9.6 on gemma-3-12b-it vs MDS=8.7 and P2=8.3finding0.764
Per-model steerability comparison from Table 4
Uncalibrated sweep units and restricted coefficient ranges are the primary cause of prior reports showing P2 outperforming MD injectionsclaim0.763
Mechanistic explanation for discrepancy with Banayeeanzade et al.; addressed by centroid unit and unbounded sweep contributions
On Qwen3-1.7B, MDS achieves ϕ1,C,↑ = 5.0 (SJTs) vs P2 at 4.7, and ϕ1,C,↓ = 1.4 (SJTs) vs P2 at 3.6finding0.752
Specific consciousness sweep result for Qwen3-1.7B from Table 6 demonstrating strong bidirectional steering
For small models, critiqued revisions yield higher harmlessness PM scores than direct revisions; for large models the difference is negligible.finding0.726
Figure 7 comparison of critiqued vs direct revisions across model sizes.
Card-counting heuristics suffice to outperform most LLMs tested.claim0.716
TrackerAgent's second-place ranking calibrates the benchmark and highlights LLM shortcomings.