Mixture-of-Experts (MoE)

Architecture of Mixtral-8x7B; uses sparse expert routing affecting how hidden states are computed across layers.

Neighborhood — ranked by edge-count

concept

Can 'Consciousness' Be Observed from Large Language Model (LLM) Internal States? Dissecting LLM Representations Obtained from Theory of Mind Test with Integrated Information Theory and Span Representation Analysis
mentions
The primary paper being extracted — applies IIT 3.0 and 4.0 to LLM representation sequences derived from ToM test data to investigate whether consciousness phenomena can be observed.
Mixtral-8x7B
implements
One of four LLMs selected; Mixture-of-Experts model; had substantial sample loss under IIT 4.0 due to PyPhi network initialization issues.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

MOICL: Mixtures of In-Context Learnersframework0.755
Selection strategy that adapts which demonstrations carry signal; in UCCT terms increases effective ρd
ML Alignment & Theory Scholars (MATS)institute0.741
Program that supported Tim Hua and Andrew Qin during this research.
Towards Monosemanticity: Decomposing Language Models with Dictionary Learning (Bricken et al., 2023)concept0.730
Foundational SAE mechanistic interpretability paper
A naive agent equipped with reduced priors from an experienced agent performs perfectly with maximum confidence from the first trial.finding0.717
Demonstration that model-level priors (not parameter-level knowledge) suffice for immediate transfer
ensemble of agentsconcept0.715
Probabilistic behaviour of an ensemble used to derive the free-energy principle.
monosemanticityconcept0.710
Interpretability property where a latent feature represents a single semantic concept; benchmarked across architectures.
"For interpretability, I don't think we even have the right definitions."quote0.703
Ian Goodfellow quote used to illustrate the pre-paradigmatic state of interpretability research
Mystification of professional expertiseconcept0.703
The widespread belief that only trained professionals can design environments, which disempowers ordinary people and prevents adaptation.