method
active
method:causal-abstraction-analysisCausal abstraction analysis
The formal method used to establish that the identified circuit causally mediates the model's cyclic reasoning behavior
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Causal abstractionimplementsA framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Formal definition: H is a constructive abstraction of L under alignment Π when interchange interventions have equivalent effects at both levels.
- Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.
- Framework for evaluating whether probe directions are causally implicated in model outputs via activation patching
- Programming technique to restructure a fine-grained Linda program for efficiency by replacing live data structures with passive ones and coarser-grain processes.
- The paper endorses Geiger et al. 2023's claim that disparate interpretability methods are instances of causal abstraction.
- Core concept: degree to which an agent exerts unique predictive power on its future; key to cognition at all scales.
- Mechanistic interpretability technique for locating factual associations, mentioned as future work direction.