concept
active
concept:constructive-causal-abstractionConstructive Causal Abstraction
Formal definition: H is a constructive abstraction of L under alignment Π when interchange interventions have equivalent effects at both levels.
Neighborhood — ranked by edge-count
Concepts (3)
concept
- Causal abstractionimplementsrelated_toA framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
- Constructive Abstractionrelated_toType of abstraction map where node information is computed from non-overlapping neuron sets
- Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The formal method used to establish that the identified circuit causally mediates the model's cyclic reasoning behavior
- Core concept: degree to which an agent exerts unique predictive power on its future; key to cognition at all scales.
- The paper endorses Geiger et al. 2023's claim that disparate interpretability methods are instances of causal abstraction.
- The ability of an agent to be a driver of subsequent events; a hallmark of cognition that causal emergence quantifies.
- Function determining the value of a variable based on its causal parents in an acyclic causal model.
- Whether an internal direction causally controls a target behavior, verified by intervention success
- What is the connection between information encoding assumptions and causal abstraction?question0.772Identified as exciting future work direction