method
active
method:causally-masked-attention

causally-masked attention

Attention mechanism with causal mask limiting each token's view to previous tokens; used in decoder-only transformers

Neighborhood — ranked by edge-count

Methods (1)

method
  • Modification to transformer restricting keys and values to previous time-steps only, mimicking how an agent accumulates experiences.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • causal maskingconcept0.847
    Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
  • Application to transformer language models
  • Causal Mediationconcept0.789
    Whether an internal direction causally controls a target behavior, verified by intervention success
  • Causal abstractionconcept0.766
    A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
  • Causal Tracingconcept0.761
    Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.
  • Causal Mechanismconcept0.760
    Function determining the value of a variable based on its causal parents in an acyclic causal model.
  • Framework for evaluating whether probe directions are causally implicated in model outputs via activation patching