Mixtral-8x7B

One of four LLMs selected; Mixture-of-Experts model; had substantial sample loss under IIT 4.0 due to PyPhi network initialization issues.

Neighborhood — ranked by edge-count

concept

Mixture-of-Experts (MoE)
implements
Architecture of Mixtral-8x7B; uses sparse expert routing affecting how hidden states are computed across layers.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Several Mixtral-8x7B samples could not be initialized as valid networks using PyPhi under IIT 4.0 and were excluded.finding0.720
Methodological limitation disproportionately affecting the largest MoE model, constraining generalizability.
Mistral-7Bconcept0.702
One of four LLMs selected for representation analysis; D=4096.
LLaMA3.1-8Bconcept0.686
One of four LLMs selected for representation analysis; embedding dimension D=4096; used as demonstration model in scatter plots.
Under spatio permutation controls, two cases (Layer 32 of Mixtral-8x7B on Strange Stories, IIT 4.0, Linguistic Spans: Entire and Complement) satisfy all three criteria.finding0.676
Contrasts with temporal permutation results; constitutes the most suggestive evidence of potential consciousness phenomena in LLM representations.
Qwen2.5-VL-7Bconcept0.667
Base vision-language model used to instantiate ATLAS.
Qwen3-1.7Bconcept0.655
Smallest Qwen3 model tested; used in conscientiousness sweep example (Table 6)