method
active
method:l1li-injectionL1LI Injection
Probe-based injection using L1-regularized logistic regressor with learned intercept on h_b activations
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Residual-Stream InjectionimplementsCore activation intervention: add scaled vector to residual stream at layer l during completion
- Residual-stream activations extracted by prefilling with Yes/No response to identity statement; achieves perfect probe separability
Methods (2)
method
- L2LI Injectionrelated_toProbe-based injection using L2-regularized logistic regressor with learned intercept on h_b activations
- L1ZI Injectionrelated_toProbe-based injection using L1-regularized logistic regressor with zero intercept on h_b activations
Related by similarity (5)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Probe-based injection using L2-regularized logistic regressor with zero intercept on h_b activations
- Parameter controlling how often an injection is applied during completion; s=1 injects on every activation, achieving strongest steering
- Prior loss-balancing method using learnable loss transformation; logarithm approach recovers this
- Technique of injecting activation patterns associated with specific concepts into a model's internal states to test whether self-reports reflect ground truth.
- Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation