method
active
method:mdb-injectionMDB Injection
Mean-difference vectors derived from Yes/No binary-prefill activations (h_b)
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Residual-Stream InjectionimplementsCore activation intervention: add scaled vector to residual stream at layer l during completion
- Residual-stream activations extracted by prefilling with Yes/No response to identity statement; achieves perfect probe separability
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
- MDS injections show no salient patterns in MPI-120 inventory responses beyond occasional co-occurring peaksfinding0.687Contrasts with SJT results; leads authors to narrow analyses to SJT responses
- Parameter controlling how often an injection is applied during completion; s=1 injects on every activation, achieving strongest steering
- Qualitative finding demonstrating unique capability of activation-level interventions unavailable to prompting methods including PM
- Probe-based injection using L2-regularized logistic regressor with zero intercept on h_b activations
- Why do MDS injections outperform other methods on the inventory (multiple-choice) task?question0.670Identified as an unexplained result and open question in limitations section
- Generative model substrate for active inference; discrete states, actions, outcomes, and temporal policies.
- Probe-based injection using L2-regularized logistic regressor with learned intercept on h_b activations