method
active
method:mean-difference-vector-patching-mdvpMean Difference Vector Patching (MDVP)
Intervention method adding the difference in mean activations between two conditions to a representation
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Generative model substrate for active inference; discrete states, actions, outcomes, and temporal policies.
- Synthetic theoretical example showing pernicious divergence via hidden pathway activation
- Empirical demonstration that MDVP produces divergent representations in a real LLM
- Steering vectors from µ(0→2) slightly outperform µ(1→2) for instruction discovery across datasets and modelsfinding0.713Shows that contrasting No Reflection with Triggered Reflection provides a stronger signal than Intrinsic vs Triggered.
- Feed-forward neural network with hidden layers, capable of representing non-linearly separable functions.
- Modeling framework for discrete state-space decision-making under uncertainty, used as generative model in active inference.
- Mechanistic explanation for MDS superiority; attributed to two design choices: centroid alignment and full-utterance semantics in h_s
- Probe construction method: concept vector at each layer is L2-normalized difference between mean positive and mean negative representations from contrastive system prompts