concept
active
concept:approximate-causal-abstraction

Approximate Causal Abstraction

Graded notion of causal abstraction measured by IIA; when IIA is alpha < 100%, the model is alpha-on-average approximately abstract.

Neighborhood — ranked by edge-count

Methods (1)

method
  • Proportion of aligned interchange interventions with equivalent high-level and low-level effects; graded measure of causal abstraction.

Concepts (2)

concept
  • A framework the paper uses alongside feature geometry to deepen mechanistic understanding of LMs
  • Formal definition: H is a constructive abstraction of L under alignment Π when interchange interventions have equivalent effects at both levels.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.