Alignment Problem

The problem of ensuring AI systems adopt values compatible with human welfare — argued to be a perennial problem already present in child-rearing

Neighborhood — ranked by edge-count

claim

concept

Alignment
related_to
The goal of making model behavior match human values and intentions, often addressed during post-training.
Ai Alignment Problem
related_to

question

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Alignment Functionconcept0.830
A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
AI alignmentconcept0.811
Field within which this work has implications for evaluating alignment progress.
Representational Alignmentconcept0.804
Measure of similarity between the similarity structures (kernels) induced by two different representations
How Do We Ensure Alignment Of Values Betweenquestion0.799
Alignment Typeconcept0.797
The only statistically significant predictor of koan battery scores (p=0.006); includes Constitutional AI, RLHF, SFT, roleplay, empathy
Alignment Map (ϕ)concept0.795
The bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied
Inner Alignmentconcept0.787
Meta-problem where AI develops hidden subgoals deviating from intended goals; addressed by mindfulness principle
problem space navigationconcept0.783
The process of moving through configuration space towards a goal; self-organisation as navigation