method
active
method:causal-attention-mask

Causal Attention Mask

Modification to transformer restricting keys and values to previous time-steps only, mimicking how an agent accumulates experiences.

Neighborhood — ranked by edge-count

Frameworks (1)

framework
  • The transformer version directly analogous to TEM, introduced in this paper, offering dramatic performance improvements.

Methods (1)

method
  • Attention mechanism with causal mask limiting each token's view to previous tokens; used in decoder-only transformers

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • causal maskingconcept0.826
    Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
  • Application to transformer language models
  • Core operation in transformers, computing weighted combinations of previous elements
  • Causal abstractionconcept0.753
    A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • Causal Mediationconcept0.749
    Whether an internal direction causally controls a target behavior, verified by intervention success
  • The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
  • Causal importanceconcept0.737
    A measure of whether a subcomponent is necessary to reproduce model behavior on a specific prompt, predicted by the causal importance network.
  • Self-attentionconcept0.735
    A form of key-query attention within a single input sequence; core to Transformers.