concept
active
concept:alignment-problemAlignment Problem
The problem of ensuring AI systems adopt values compatible with human welfare — argued to be a perennial problem already present in child-rearing
Neighborhood — ranked by edge-count
Claims (1)
claim
- Deflates the novelty of AI alignment by pointing to its structural identity with intergenerational value transmission
Concepts (2)
concept
- Alignmentrelated_toThe goal of making model behavior match human values and intentions, often addressed during post-training.
- Ai Alignment Problemrelated_to
Questions (1)
question
- Research gap identified as structurally parallel to AI alignment problem
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
- Field within which this work has implications for evaluating alignment progress.
- Measure of similarity between the similarity structures (kernels) induced by two different representations
- The only statistically significant predictor of koan battery scores (p=0.006); includes Constitutional AI, RLHF, SFT, roleplay, empathy
- The bijective function mapping DNN inner neurons to latent variables in causal abstraction; its complexity is the central variable studied
- Meta-problem where AI develops hidden subgoals deviating from intended goals; addressed by mindfulness principle
- The process of moving through configuration space towards a goal; self-organisation as navigation