ML Alignment & Theory Scholars (MATS)

Program that supported Tim Hua and Andrew Qin during this research.

Neighborhood — ranked by edge-count

thinker

Andrew Qin
affiliated_with
First author; co-conceived project, built SDF infrastructure, and conducted expert iteration.
Tim Tian Hua
affiliated_with
First author; led steering experiments and wrote the final paper.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Model Alignment Search (MAS)framework0.795
The primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.
MAS reduces number of required alignment matrices for n-model comparison from n(n-1) or n^2 (stitching) to nfinding0.742
Key computational efficiency advantage of MAS over traditional model stitching for multi-model comparisons.
Mixture-of-Experts (MoE)concept0.741
Architecture of Mixtral-8x7B; uses sparse expert routing affecting how hidden states are computed across layers.
Machine Learning Research Group, Department of Engineering Science, University of Oxfordinstitute0.721
Research group where Philip Ball is based
Alignment Between High-Level and Low-Level Modelsconcept0.717
A mapping assigning to each high-level variable a set of low-level variables and a function from low-level to high-level values.
Linear Alignment Map (ϕ_lin)method0.717
Alignment map ϕ(h)=W_orth*h using orthogonal matrix; assumes linear representation hypothesis
Deliberative Alignmentframework0.713
OpenAI's approach integrating chain-of-thought reasoning into alignment; parallels contemplative self-monitoring
Alignmentconcept0.712
The goal of making model behavior match human values and intentions, often addressed during post-training.