hypothesis
active
hypothesis:the-and-or-algorithm-may-not-be-a-true-abstraction-of-the-trained-mlp-s-behaviour-since-it-never-achieves-high-iia-in-later-layers-regardless-of-alignment-map-complexity

The And-Or algorithm may not be a true abstraction of the trained MLP's behaviour since it never achieves high IIA in later layers regardless of alignment map complexity

Hypothesis raised in distributive law task analysis

Source paper

extracted_from
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.