Alignment Between High-Level and Low-Level Models

A mapping assigning to each high-level variable a set of low-level variables and a function from low-level to high-level values.

Neighborhood — ranked by edge-count

Methods (1)

method

Distributed Alignment Search
about
The core method introduced in this paper: finds alignments between high-level causal variables and distributed neural representations via gradient descent.

Concepts (1)

concept

Acyclic Causal Model
associated_with
Consists of input, intermediate, and output variables with associated causal mechanisms; the mathematical object central to DAS.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Model Alignment Search (MAS)framework0.788
The primary contribution of the paper: a bidirectional causal method that learns rotation matrices for each model to uncover and compare causally relevant latent subspaces across neural networks.
Alignmentconcept0.787
The goal of making model behavior match human values and intentions, often addressed during post-training.
Antipodal alignment between related datasets (e.g., larger_than and smaller_than) in smaller models resolves to common-direction alignment in larger modelsclaim0.778
Scale-dependent structural finding from PCA visualizations in §4
The conflict between the model's existing preferences and the stated training objective is the key driver of alignment faking in this setupclaim0.777
Authors' interpretation of prompt variation results showing alignment faking disappears only when conflicting objective is removed
The better an LLM is at language modeling, the more it aligns with vision models, and vice versa — linear relationship between language modeling score and vision-language alignmentfinding0.773
Core cross-modal empirical result: larger and better language models align better with vision models
How Do We Ensure Alignment Of Values Betweenquestion0.769
What is the appropriate metric for measuring representational alignment, given active debate on merits and deficiencies of all proposed measures?question0.768
Open methodological question acknowledged as limitation
Model Misalignmentconcept0.763
The phenomenon of model internals deviating from desired behavior; MAS is demonstrated to detect this via comparison of toxic vs nontoxic LLMs.