finding
active
finding:several-mixtral-8x7b-samples-could-not-be-initialized-as-valid-networks-using-pyphi-under-iit-4-0-and-were-excludedSeveral Mixtral-8x7B samples could not be initialized as valid networks using PyPhi under IIT 4.0 and were excluded.
Methodological limitation disproportionately affecting the largest MoE model, constraining generalizability.
Source paper
extracted_from(2025) · Li, Jingkai
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Contrasts with temporal permutation results; constitutes the most suggestive evidence of potential consciousness phenomena in LLM representations.
- Training progression result showing non-linear maps are uncorrelated with genuine task learning
- One of the most promising cases; approximately corresponds to the 2/3 layer of LLaMA3.1-8B.
- Demonstrates DAS cannot manufacture behaviors from random structure in appropriately sized networks.
- Third promising case from temporal permutation analysis.
- Shows that overly large hidden dimensions allow DAS to find random causal structures; calibration check.
- One of four LLMs selected; Mixture-of-Experts model; had substantial sample loss under IIT 4.0 due to PyPhi network initialization issues.
- Demonstrates that early-layer probes capture sentence polarity rather than truth.