hypothesis
active
hypothesis:causally-emergent-alignment-hypothesisCausally Emergent Alignment Hypothesis
The hypothesis that successful RL agents will display causal emergence that is predictive of final reward early in training and whose representational dynamics align with reward improvement.
Source paper
extracted_from(2026) · Federico Pigozzi · Michael Levin
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (2)
finding
- Empirical result: CE measurements correlate with and predict learning performance in RL agents.
- The trajectory of causal emergence through training mirrors the reward improvement curve across the majority of tested environments.
Claims (3)
claim
- Assertion that the correlation between causal emergence and learning constitutes another way biological and artificial intelligences converge.
- Assertion that understanding causal emergence may lead to methods for manipulating agent representations to improve performance.
- Authors' interpretive assertion that the observed alignment reveals a novel organizing principle of neural representation dynamics.
Frameworks (1)
framework
- Diverse IntelligenceextendsResearch program studying intelligence at multiple scales and substrates; proposed as relevant to implications of mnemonic improvisation.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core concept: degree to which an agent exerts unique predictive power on its future; key to cognition at all scales.
- Quantitative emergence theory based on Markov dynamics and effective information (EI).
- Cross-fertilization claim made in discussion.
- Alignment faking appears almost exclusively in models at scale of Claude 3 Opus and Claude 3.5 Sonnet
- Claim by Comolatti & Hoel (2022) endorsed by this survey.
- Core definition from §1.
- Open problem stated in §5.4.
- Example from Hoel et al. (2013) replicated in the survey.