concept
active
concept:ai-alignment-problemAi Alignment Problem
Neighborhood — ranked by edge-count
Concepts (3)
concept
- AI alignmentrelated_toField within which this work has implications for evaluating alignment progress.
- Alignment Problemrelated_toThe problem of ensuring AI systems adopt values compatible with human welfare — argued to be a perennial problem already present in child-rearing
- Dunning-Kruger Phase in AI Developmentassociated_withDangerous stage when AI surpasses humans in many domains but lacks wisdom or ethical maturity to use capabilities responsibly
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The broader domain for which ESR has dual implications: resistance to adversarial manipulation vs. interference with safety interventions
- The goal of making model behavior match human values and intentions, often addressed during post-training.
- A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
- Deflates the novelty of AI alignment by pointing to its structural identity with intergenerational value transmission
- Baseline method that exhaustively searches discrete spaces of localist alignments between high-level variables and neuron groups.
- Measure of similarity between the similarity structures (kernels) induced by two different representations