finding
active
finding:mds-achieves-global-win-proportion-of-89-5-on-sjts-across-14-llms-and-four-injection-stridesMDS achieves global win proportion of 89.5% on SJTs across 14 LLMs and four injection strides
MDS dominates in open-ended generation by global win proportion metric (Table 2)
Source paper
extracted_from(2026) · Leonardo Blas · Robin Jia · Emilio Ferrara
Neighborhood — ranked by edge-count
Claims (1)
claim
- Mechanistic explanation for MDS superiority; attributed to two design choices: centroid alignment and full-utterance semantics in h_s
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- MDS is also the top method on the inventory task but with much smaller margin than on SJTs (Table 2)
- MDS injections outperform P2 in open-ended generation in 11 of 14 LLMs with Phi gains of 3.61% to 16.44%finding0.806Primary quantitative result overturning prior reports that prompting outperforms representation engineering
- Empirical finding about injection stride parameter; injecting into every completion activation maximizes steering strength
- Key finding showing that combining prompting and injection is the strongest approach
- On Qwen3-1.7B, MDS achieves ϕ1,C,↑ = 5.0 (SJTs) vs P2 at 4.7, and ϕ1,C,↓ = 1.4 (SJTs) vs P2 at 3.6finding0.765Specific consciousness sweep result for Qwen3-1.7B from Table 6 demonstrating strong bidirectional steering
- DAS achieves overall odds-ratio of 10.24 on pythia-410m averaged across all CausalGym tasksfinding0.751Numerical result for pythia-410m
- Per-model steerability comparison from Table 4
- Theoretical alignment claim backed by OLS R2 analysis showing 96.15% of trends have R2>=0.75