method
active
method:causally-masked-attentioncausally-masked attention
Attention mechanism with causal mask limiting each token's view to previous tokens; used in decoder-only transformers
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Causal Attention Maskrelated_toModification to transformer restricting keys and values to previous time-steps only, mimicking how an agent accumulates experiences.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
- Application to transformer language models
- Whether an internal direction causally controls a target behavior, verified by intervention success
- A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
- The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
- Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.
- Function determining the value of a variable based on its causal parents in an acyclic causal model.
- Framework for evaluating whether probe directions are causally implicated in model outputs via activation patching