causally-masked attention

Attention mechanism with causal mask limiting each token's view to previous tokens; used in decoder-only transformers

Neighborhood — ranked by edge-count

paper

method

Causal Attention Mask
related_to
Modification to transformer restricting keys and values to previous time-steps only, mimicking how an agent accumulates experiences.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

causal maskingconcept0.847
Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
Causally-masked attention in a decoder-only model has no ordered phase (Proposition 2)finding0.804
Application to transformer language models
Causal Mediationconcept0.789
Whether an internal direction causally controls a target behavior, verified by intervention success
Causal abstractionconcept0.766
A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
Causal Intervention on Representationsconcept0.762
The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
Causal Tracingconcept0.761
Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.
Causal Mechanismconcept0.760
Function determining the value of a variable based on its causal parents in an acyclic causal model.
Causal Mediation Analysisconcept0.759
Framework for evaluating whether probe directions are causally implicated in model outputs via activation patching