finding

active

finding:das-on-randomly-initialized-small-networks-n-16-achieves-only-0-50-iia-chance-cannot-construct-new-behaviors

DAS on randomly initialized small networks (|N|=16) achieves only 0.50 IIA (chance), cannot construct new behaviors

Demonstrates DAS cannot manufacture behaviors from random structure in appropriately sized networks.

Source paper

extracted_from

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

Larger hidden representations create more random structure that DAS can search through, allowing manipulation of counterfactual behavior even in randomly initialized networks
supports
Tested in Section 4.4 calibration experiment; confirmed by findings.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DAS on oversized randomly initialized network (|N|=4096 for 16-dim input) achieves 0.64 IIA by searching random structurefinding0.883
Shows that overly large hidden dimensions allow DAS to find random causal structures; calibration check.
DAS achieves 100% IIA on hierarchical equality task with |N|=16, intervention size 8, Layer 1finding0.759
DAS discovers a perfect alignment between the feed-forward network and the Both Equality Relations high-level model.
DAS behavioral loss achieves IIA of 0.997±0.001 on synthetic 10-class dataset training/test setsfinding0.758
IIA baseline for DAS behavioral loss on synthetic dataset
EI of ER random networks converges to -log2(p) with increasing size, with a phase transition at average degree ≈ log2(N).finding0.757
From Klein & Hoel (2020) analysis of artificial complex networks.
DAS learning rate of 5e-3 outperforms 1e-3 (used in Wu et al. 2023) for small training sets in CausalGymfinding0.748
Hyperparameter tuning result for DAS; different from prior work due to smaller training set size
DAS overcomes the localist limitation of prior causal abstraction by allowing individual neurons to play multiple roles via non-standard basesclaim0.745
Central claim motivating DAS over prior methods.
Under reward shaping (G=100, H=-100, F=0), Active Inference scored 99.52, Bayesian RL 99.77, Q-learning 95.56, with nearly identical behavior between belief-based agents.finding0.737
Table 2, row 3, showing equivalence when prior preferences match rewards.
Several Mixtral-8x7B samples could not be initialized as valid networks using PyPhi under IIT 4.0 and were excluded.finding0.734
Methodological limitation disproportionately affecting the largest MoE model, constraining generalizability.