concept
active
concept:causal-mediation-analysisCausal Mediation Analysis
Framework for evaluating whether probe directions are causally implicated in model outputs via activation patching
Neighborhood — ranked by edge-count
Papers (1)
paper
Concepts (1)
concept
- Causal Mediationrelated_toWhether an internal direction causally controls a target behavior, verified by intervention success
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The extent to which a probe direction, when intervened upon, actually changes model outputs — contrasted with mere classification accuracy
- The formal method used to establish that the identified circuit causally mediates the model's cyclic reasoning behavior
- Function determining the value of a variable based on its causal parents in an acyclic causal model.
- The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
- Formal representation of algorithms as directed acyclic graphs computing functions f_A
- Traditional mechanistic accounts (Danto, Chisholm, Goldman) that Juarrero critiques as resting on outdated Newtonian causality.
- Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
- A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs