hypothesis
active
hypothesis:if-simulators-are-not-inner-aligned-then-many-important-properties-like-prediction-orthogonality-may-not-holdIf simulators are not inner aligned, then many important properties like prediction orthogonality may not hold.
Conditional importance of inner alignment.
Source paper
extracted_fromNeighborhood — ranked by edge-count
Frameworks (1)
framework
- Inner alignment frameworkassociated_withThe concept of inner vs outer alignment, referenced multiple times.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows high IIA on random models depends on entity overlap; generalisation is essential for genuine interpretation
- Distinguishes the passive simulator from active simulacra that can appear to have agency
- Central thesis of the post.
- Proposes middle-range entity quality as the criterion for judging the success of a building process
- Theoretical limitation identified by the authors distinguishing reflection from stylistic tasks.
- One of the updates about prosaic ML simulation.
- Kruskal-Wallis test result: Constitutional AI predicts highest baseline; roleplay/empathy training predict lowest.
- Superposition hypothesis: neural networks represent more features than dimensions using almost-orthogonal directions.hypothesis0.749Explanation for why dictionary learning can recover many more features than dimensions.