concept
active
concept:attention-block-output-activation

Attention Block Output Activation

The specific activation representation used: output of ℓ-th attention block = MLP output + residual stream.

Neighborhood — ranked by edge-count

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
  • Activationsconcept0.775
    Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
  • Process using Q, K, V to compute a heat map over K and weighted sum of V.
  • Core operation in transformers, computing weighted combinations of previous elements
  • The conventional approach (e.g., SAEs, transcoders) of decomposing activations into interpretable features.
  • Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset
  • Attention Schemaconcept0.730
    A predictive model representing and controlling attention; central to attention schema theory.
  • Supervised method training models to answer questions about activations; NLAs differ by being unsupervised.