claim
active
claim:direct-probes-over-learned-activations-in-standard-basis-may-fail-to-reveal-the-actual-causal-role-of-representations-because-they-are-highly-distributed

Direct probes over learned activations in standard basis may fail to reveal the actual causal role of representations because they are highly distributed

Supported by the finding that non-trivial rotations are required to find aligned representations.

Source paper

extracted_from
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Findings (1)

finding

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.