Inner Alignment

Meta-problem where AI develops hidden subgoals deviating from intended goals; addressed by mindfulness principle

Neighborhood — ranked by edge-count

claim

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Inner alignment frameworkframework0.856
The concept of inner vs outer alignment, referenced multiple times.
Alignmentconcept0.845
The goal of making model behavior match human values and intentions, often addressed during post-training.
Alignment Problemconcept0.787
The problem of ensuring AI systems adopt values compatible with human welfare — argued to be a perennial problem already present in child-rearing
How Does Inner Alignment Enable Transitions To Higherquestion0.784
Alignment Functionconcept0.781
A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
AI alignmentconcept0.780
Field within which this work has implications for evaluating alignment progress.
Inner perspectiveconcept0.776
Defining feature of consciousness being analyzed across theories; the paper asks whether it is confined to neural substrates.
How Do Phase Synchronization And Inner Alignment Relatequestion0.774