concept
active
concept:approximate-causal-abstractionApproximate Causal Abstraction
Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.
Neighborhood — ranked by edge-count
Papers (1)
paper
Methods (1)
method
- Interchange Intervention AccuracyimplementsProportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.
Concepts (2)
concept
- Causal abstractionrelated_toA framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
- Formal definition: H is a constructive abstraction of L under alignment Π when interchange interventions have equivalent effects at both levels.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The formal method used to establish that the identified circuit causally mediates the model's cyclic reasoning behavior
- What is the connection between information encoding assumptions and causal abstraction?question0.791Identified as exciting future work direction
- Methodological claim about the scientific value of combining causal abstraction with representational geometry analysis
- Central thesis of the paper
- A measure of whether a subcomponent is necessary to reproduce model behavior on a specific prompt, predicted by the causal importance network.
- Programming technique to restructure a fine-grained Linda program for efficiency by replacing live data structures with passive ones and coarser-grain processes.
- Authors' interpretation connecting their proof to practical interpretability methodology