L1LI Injection

Probe-based injection using L1-regularized logistic regressor with learned intercept on h_b activations

Neighborhood — ranked by edge-count

concept

Residual-Stream Injection
implements
Core activation intervention: add scaled vector to residual stream at layer l during completion
h_b Activations (Yes/No Binary Prefill)
uses
Residual-stream activations extracted by prefilling with Yes/No response to identity statement; achieves perfect probe separability

method

L2LI Injection
related_to
Probe-based injection using L2-regularized logistic regressor with learned intercept on h_b activations
L1ZI Injection
related_to
Probe-based injection using L1-regularized logistic regressor with zero intercept on h_b activations

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

L2ZI Injectionmethod0.760
Probe-based injection using L2-regularized logistic regressor with zero intercept on h_b activations
Injection Stridemethod0.673
Parameter controlling how often an injection is applied during completion; s=1 injects on every activation, achieving strongest steering
IMTL-Lframework0.662
Prior loss-balancing method using learnable loss transformation; logarithm approach recovers this
Concept Injectionconcept0.656
Technique of injecting activation patterns associated with specific concepts into a model's internal states to test whether self-reports reflect ground truth.
MDS Injectionmethod0.651
Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation