claim
active
claim:assuming-linear-representations-enables-identifying-the-location-of-certain-variables-in-a-dnn-but-many-insights-fail-to-generalise-when-more-powerful-non-linear-maps-are-used

Assuming linear representations enables identifying the location of certain variables in a DNN, but many insights fail to generalise when more powerful non-linear maps are used

Interpretive claim about what linear DAS results actually tell us

Source paper

extracted_from
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.