institute
active
institute:ml-alignment-theory-scholars-matsML Alignment & Theory Scholars (MATS)
Program that supported Tim Hua and Andrew Qin during this research.
Neighborhood — ranked by edge-count
Thinkers (2)
thinker
- Andrew Qinaffiliated_withFirst author; co-conceived project, built SDF infrastructure, and conducted expert iteration.
- Tim Tian Huaaffiliated_withFirst author; led steering experiments and wrote the final paper.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.
- MAS reduces number of required alignment matrices for n-model comparison from n(n-1) or n^2 (stitching) to nfinding0.742Key computational efficiency advantage of MAS over traditional model stitching for multi-model comparisons.
- Architecture of Mixtral-8x7B; uses sparse expert routing affecting how hidden states are computed across layers.
- Machine Learning Research Group, Department of Engineering Science, University of Oxfordinstitute0.721Research group where Philip Ball is based
- A mapping assigning to each high-level variable a set of low-level variables and a function from low-level to high-level values.
- Alignment map ϕ(h)=W_orth*h using orthogonal matrix; assumes linear representation hypothesis
- OpenAI's approach integrating chain-of-thought reasoning into alignment; parallels contemplative self-monitoring
- The goal of making model behavior match human values and intentions, often addressed during post-training.