Causal Mediation of Model Outputs

The extent to which a probe direction, when intervened upon, actually changes model outputs — contrasted with mere classification accuracy

Neighborhood — ranked by edge-count

concept

Causal Mediation
related_to
Whether an internal direction causally controls a target behavior, verified by intervention success

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Causal Mediation Analysisconcept0.830
Framework for evaluating whether probe directions are causally implicated in model outputs via activation patching
Deterministic Causal Modelconcept0.802
Formal representation of algorithms as directed acyclic graphs computing functions f_A
dynamic causal modelsconcept0.788
Nonlinear models of brain dynamics that can be inverted via DEM.
Causal Intervention on Representationsconcept0.770
The use of interventions (rather than correlations) to establish a causal link between representation geometry and behavioral geometry.
causal shaping of behavior by representation geometryconcept0.760
Central question: does geometry in activation space causally determine behavior?
Forward model (output-input model)concept0.751
A model mapping outputs to expected inputs, used in motor control and perception for embodiment.
Acyclic Causal Modelconcept0.749
Consists of input, intermediate, and output variables with associated causal mechanisms; the mathematical object central to DAS.
causal maskingconcept0.749
Attention restricted to previous tokens only, as in decoder-only models; leads to AR(ω)-like behaviour and no ordered phase