finding
active
finding:minimal-euclidean-distances-between-hidden-states-are-smaller-for-pairs-sharing-same-output-or-equality-variable-values-than-for-pairs-that-do-not-across-1-280-000-mlp-samples

Minimal Euclidean distances between hidden states are smaller for pairs sharing same output or equality-variable values than for pairs that do not, across 1,280,000 MLP samples

Explains why RevNet lacks capacity to separate states for identity-of-first-argument algorithm

Source paper

extracted_from
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.