concept
active
concept:causal-mediation

Causal Mediation

Whether an internal direction causally controls a target behavior, verified by intervention success

Neighborhood — ranked by edge-count

Methods (1)

method
  • Key evaluation metric: proportion of inputs for which an intervention successfully flips model output

Concepts (2)

concept
  • Framework for evaluating whether probe directions are causally implicated in model outputs via activation patching
  • The extent to which a probe direction, when intervened upon, actually changes model outputs — contrasted with mere classification accuracy

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Causal Mechanismconcept0.842
    Function determining the value of a variable based on its causal parents in an acyclic causal model.
  • causal maskingconcept0.829
    Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase
  • Causal Invarianceconcept0.809
    Property that causal mechanisms remain stable across environments; desirable for OOD.
  • Causal abstractionconcept0.809
    A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • Formal representation of algorithms as directed acyclic graphs computing functions f_A
  • Causal Emergenceconcept0.805
    Core concept: degree to which an agent exerts unique predictive power on its future; key to cognition at all scales.
  • Causal Geometryframework0.800
    Chvykov and Hoel's geometric extension of causal emergence to continuous systems using Fisher information.
  • Causal Tracingconcept0.800
    Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.