concept
active
concept:causal-mediation-analysis

Causal Mediation Analysis

Framework for evaluating whether probe directions are causally implicated in model outputs via activation patching

Neighborhood — ranked by edge-count

Concepts (1)

concept
  • Whether an internal direction causally controls a target behavior, verified by intervention success

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The extent to which a probe direction, when intervened upon, actually changes model outputs — contrasted with mere classification accuracy
  • The formal method used to establish that the identified circuit causally mediates the model's cyclic reasoning behavior
  • Causal Mechanismconcept0.788
    Function determining the value of a variable based on its causal parents in an acyclic causal model.
  • The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
  • Formal representation of algorithms as directed acyclic graphs computing functions f_A
  • Traditional mechanistic accounts (Danto, Chisholm, Goldman) that Juarrero critiques as resting on outdated Newtonian causality.
  • causal maskingconcept0.764
    Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
  • Causal abstractionconcept0.760
    A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs