hypothesis

active

hypothesis:larger-hidden-representations-create-more-random-structure-that-das-can-search-through-allowing-manipulation-of-counterfactual-behavior-even-in-randomly-initialized-networks

Larger hidden representations create more random structure that DAS can search through, allowing manipulation of counterfactual behavior even in randomly initialized networks

Tested in Section 4.4 calibration experiment; confirmed by findings.

Source paper

extracted_from

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Findings (2)

finding

DAS on oversized randomly initialized network (|N|=4096 for 16-dim input) achieves 0.64 IIA by searching random structure
supports
Shows that overly large hidden dimensions allow DAS to find random causal structures; calibration check.
DAS on randomly initialized small networks (|N|=16) achieves only 0.50 IIA (chance), cannot construct new behaviors
supports
Demonstrates DAS cannot manufacture behaviors from random structure in appropriately sized networks.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Patching h[1] with a divergent representation can activate distinct, hidden pathways that result in misleadingly confirmatory behavior and/or undetected behavior.quote0.781
Load-bearing description of the core pernicious divergence mechanism illustrated in Figure 1
Direct probes over learned activations in standard basis may fail to reveal the actual causal role of representations because they are highly distributedclaim0.778
Supported by the finding that non-trivial rotations are required to find aligned representations.
There are fewer representations competent for N tasks than M<N tasks, so training more general models should yield fewer possible solutionshypothesis0.777
Selective pressure toward convergence via task generality
DAS overcomes the localist limitation of prior causal abstraction by allowing individual neurons to play multiple roles via non-standard basesclaim0.774
Central claim motivating DAS over prior methods.
Deep representations have a special significance in recurrent networks, allowing coordinated behaviour without losing sensitivity to new inputs.claim0.772
Importance of hierarchical structure for flexible coordination.
probably helps not only with faithful reconstruction but also creates interference patterns that encode nuanced information about the deltas and convergences between states.quote0.769
Key quote connecting path redundancy to interferometric information encoding.
Shallow interaction structures cannot compute non-linearly separable functions; depth (hidden layers) is necessary for ETI-relevant individualityclaim0.765
Assertion that deep organization is mandatory, based on connectionist theory
We hypothesize that representation geometry drives model behavior — the geometric structure of internal representations causally shapes what models do externally.hypothesis0.764
The causal hypothesis motivating the use of causality (intervention) as the lens connecting representation and behavior geometry.