finding

active

finding:das-on-oversized-randomly-initialized-network-n-4096-for-16-dim-input-achieves-0-64-iia-by-searching-random-structure

DAS on oversized randomly initialized network (|N|=4096 for 16-dim input) achieves 0.64 IIA by searching random structure

Shows that overly large hidden dimensions allow DAS to find random causal structures; calibration check.

Source paper

extracted_from

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

(2023) · Atticus Geiger · Zhengxuan Wu · Christopher Potts · Thomas Icard +1

Neighborhood — ranked by edge-count

Hypotheses (1)

hypothesis

Larger hidden representations create more random structure that DAS can search through, allowing manipulation of counterfactual behavior even in randomly initialized networks
supports
Tested in Section 4.4 calibration experiment; confirmed by findings.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

DAS on randomly initialized small networks (|N|=16) achieves only 0.50 IIA (chance), cannot construct new behaviorsfinding0.883
Demonstrates DAS cannot manufacture behaviors from random structure in appropriately sized networks.
EI of ER random networks converges to -log2(p) with increasing size, with a phase transition at average degree ≈ log2(N).finding0.768
From Klein & Hoel (2020) analysis of artificial complex networks.
DAS achieves 100% IIA on hierarchical equality task with |N|=16, intervention size 8, Layer 1finding0.732
DAS discovers a perfect alignment between the feed-forward network and the Both Equality Relations high-level model.
DAS runs in 502 seconds for hierarchical equality vs. estimated 6e8 seconds for exhaustive brute-force searchfinding0.730
DAS runtime is invariant with number of testing hypotheses, unlike brute-force search.
Larger S_max correlates with smaller θ50 across backbones in E3 (negative association consistent across pooling and metric choices)finding0.722
Key geometry-to-behavior bridge finding in E3; robust to pooling choice, cosine vs. L2, and frozen external encoder
Several Mixtral-8x7B samples could not be initialized as valid networks using PyPhi under IIT 4.0 and were excluded.finding0.721
Methodological limitation disproportionately affecting the largest MoE model, constraining generalizability.
DAS behavioral loss produces EMD along feature dimensions of 0.032±0.003 on synthetic 10-class datasetfinding0.721
Quantitative baseline for divergence using behavioral DAS loss on synthetic dataset
DAS behavioral loss achieves IIA of 0.997±0.001 on synthetic 10-class dataset training/test setsfinding0.717
IIA baseline for DAS behavioral loss on synthetic dataset