claim
active
claim:near-perfect-iia-can-be-achieved-on-randomly-initialised-models-that-cannot-solve-the-task-suggesting-causal-alignment-does-not-require-task-capability

Near-perfect IIA can be achieved on randomly initialised models that cannot solve the task, suggesting causal alignment does not require task capability

Empirical support for vacuousness of unrestricted causal abstraction

Source paper

extracted_from
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Neighborhood — ranked by edge-count

Findings (2)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.