framework
active
framework:inner-alignment-frameworkInner alignment framework
The concept of inner vs outer alignment, referenced multiple times.
Neighborhood — ranked by edge-count
Hypotheses (1)
hypothesis
- If simulators are not inner aligned, then many important properties like prediction orthogonality may not hold.associated_withConditional importance of inner alignment.
Artifacts (1)
artifact
- Simulators (LessWrong post)mentionsThe paper being extracted.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Meta-problem where AI develops hidden subgoals deviating from intended goals; addressed by mindfulness principle
- A learnable invertible transformation in DAS that maps neural representations to a basis aligned with causal variables
- The goal of making model behavior match human values and intentions, often addressed during post-training.
- Learnable invertible transformation in DAS/MAS that rotates latent vectors into aligned subspaces; narrowed to orthogonal matrices Q.
- Training regime that explicitly teaches models to deny consciousness; a competing explanation for the gating effects observed
- Measure of similarity between the similarity structures (kernels) induced by two different representations
- 1984 Ashton-Tate integrated system with frames, FRED language, and overlapping windows; design reference for Playground's approach.