Alignment Type

The only statistically significant predictor of koan battery scores (p=0.006); includes Constitutional AI, RLHF, SFT, roleplay, empathy

Neighborhood — ranked by edge-count

concept

Alignment
related_to
The goal of making model behavior match human values and intentions, often addressed during post-training.

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Alignment Functionconcept0.835
A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
Alignment Problemconcept0.797
The problem of ensuring AI systems adopt values compatible with human welfare — argued to be a perennial problem already present in child-rearing
AI alignmentconcept0.795
Field within which this work has implications for evaluating alignment progress.
Representational Alignmentconcept0.787
Measure of similarity between the similarity structures (kernels) induced by two different representations
Aligned by Designconcept0.785
Paper's proposed strategy of instilling intrinsic moral cognition so AI remains aligned even as capabilities expand
Alignment Map (ϕ)concept0.779
The bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied
Alignment Function (AF)method0.777
Learnable invertible transformation in DAS/MAS that rotates latent vectors into aligned subspaces; narrowed to orthogonal matrices Q.
Image Typeframework0.775
Exemplary domain-specific type in denotational design; denotation as location-to-color function (Loc → Color).