MDB Injection

Mean-difference vectors derived from Yes/No binary-prefill activations (h_b)

Neighborhood — ranked by edge-count

concept

Residual-Stream Injection
implements
Core activation intervention: add scaled vector to residual stream at layer l during completion
h_b Activations (Yes/No Binary Prefill)
uses
Residual-stream activations extracted by prefilling with Yes/No response to identity statement; achieves perfect probe separability

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

MDS Injectionmethod0.844
Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaksfinding0.687
Contrasts with SJT results; leads authors to narrow analyses to SJT responses
Injection Stridemethod0.686
Parameter controlling how often an injection is applied during completion; s=1 injects on every activation, achieving strongest steering
MDS injections can steer toward multiple distinct constructs in the same completion, producing strongly polarized yet smoothly connected segmentsfinding0.682
Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
L2ZI Injectionmethod0.673
Probe-based injection using L2-regularized logistic regressor with zero intercept on h_b activations
Why do MDS injections outperform other methods on the inventory (multiple-choice) task?question0.670
Identified as an unexplained result and open question in limitations section
Markov Decision Process (MDP)framework0.665
Generative model substrate for active inference; discrete states, actions, outcomes, and temporal policies.
L2LI Injectionmethod0.665
Probe-based injection using L2-regularized logistic regressor with learned intercept on h_b activations