Causally Emergent Alignment Hypothesis

The hypothesis that successful RL agents will display causal emergence that is predictive of final reward early in training and whose representational dynamics align with reward improvement.

Source paper

extracted_from

The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents

(2026) · Federico Pigozzi · Michael Levin

Neighborhood — ranked by edge-count

Papers (1)

paper

The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents
introduces

Findings (2)

finding

Causal emergence predictive of final reward early in RL training across multiple algorithms, architectures, and environments.
supports
Empirical result: CE measurements correlate with and predict learning performance in RL agents.
Representational dynamics of causal emergence align with reward improvement in most tasks.
supports
The trajectory of causal emergence through training mirrors the reward improvement curve across the majority of tested environments.

Claims (3)

claim

Causal emergence alignment with learning is a shared axis comparing biological and artificial creatures.
extends
Assertion that the correlation between causal emergence and learning constitutes another way biological and artificial intelligences converge.
Causal emergence can enable causal interventions to create better RL agents.
extends
Assertion that understanding causal emergence may lead to methods for manipulating agent representations to improve performance.
Causal emergence may be a previously undisclosed axis of reorganization of neural representations in RL agents.
extends
Authors' interpretive assertion that the observed alignment reveals a novel organizing principle of neural representation dynamics.

Frameworks (1)

framework

Diverse Intelligence
extends
Research program studying intelligence at multiple scales and substrates; proposed as relevant to implications of mnemonic improvisation.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Causal Emergenceconcept0.805
Core concept: degree to which an agent exerts unique predictive power on its future; key to cognition at all scales.
Hoel's Causal Emergence Theoryframework0.803
Quantitative emergence theory based on Markov dynamics and effective information (EI).
Causal emergence provides new perspectives for causal representation learning, interpreting latent variables as emergent causalities.claim0.796
Cross-fertilization claim made in discussion.
Emergent Alignment Faking with Scaleconcept0.787
Alignment faking appears almost exclusively in models at scale of Claude 3 Opus and Claude 3.5 Sonnet
Causal emergence is widespread across measures of causation, not just EI.claim0.781
Claim by Comolatti & Hoel (2022) endorsed by this survey.
Causal emergence is defined by the difference in the EI values between the macro-level and micro-level.quote0.774
Core definition from §1.
When does causal emergence occur?question0.767
Open problem stated in §5.4.
Causal emergence depends on the coarse-graining strategy: different partitions of the same boolean network yield EI values 1.55 (emergence) vs 0.18 (degradation).finding0.767
Example from Hoel et al. (2013) replicated in the survey.