Inner alignment framework

The concept of inner vs outer alignment, referenced multiple times.

Neighborhood — ranked by edge-count

hypothesis

artifact

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Inner Alignmentconcept0.856
Meta-problem where AI develops hidden subgoals deviating from intended goals; addressed by mindfulness principle
Alignment Functionconcept0.779
A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
Alignmentconcept0.773
The goal of making model behavior match human values and intentions, often addressed during post-training.
Alignment Function (AF)method0.767
Learnable invertible transformation in DAS/MAS that rotates latent vectors into aligned subspaces; narrowed to orthogonal matrices Q.
RLHF Alignmentconcept0.764
Training regime that explicitly teaches models to deny consciousness; a competing explanation for the gating effects observed
How Does Inner Alignment Enable Transitions To Higherquestion0.754
Representational Alignmentconcept0.752
Measure of similarity between the similarity structures (kernels) induced by two different representations
Frameworkconcept0.744
1984 Ashton-Tate integrated system with frames, FRED language, and overlapping windows; design reference for Playground's approach.