concept
active
concept:residual-stream-injection

Residual-Stream Injection

Core activation intervention: add scaled vector to residual stream at layer l during completion

Neighborhood — ranked by edge-count

Methods (4)

method
  • L1LI Injection
    implements
    Probe-based injection using L1-regularized logistic regressor with learned intercept on h_b activations
  • L2LI Injection
    implements
    Probe-based injection using L2-regularized logistic regressor with learned intercept on h_b activations
  • MDS Injection
    implements
    Mean-difference vectors derived from self-statement activations (h_s); best-performing injection method in open-ended generation
  • MDB Injection
    implements
    Mean-difference vectors derived from Yes/No binary-prefill activations (h_b)

Concepts (1)

concept
  • Residual Stream
    related_to
    Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The intermediate representations in transformer layers whose activations are patched and probed for truth information
  • Technique to localize causally implicated hidden states by swapping residual stream activations between a true and false input and measuring downstream log-probability changes
  • The finite dimensional capacity of the residual stream for storing and communicating information between layers; conceptualized as being under high demand
  • Used to localize causally implicated hidden states by swapping activations between true and false inputs
  • The phenomenon where the residual stream communicates many more features than its dimensionality by encoding information across overlapping subspaces
  • The specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
  • The network's tendency to actively attenuate injected perturbations over subsequent layers, erasing the signal before output
  • Tracks cosine similarity, norm ratio, and injection direction projection across layers to measure recovery from perturbation