paper:doi-10-48550-arxiv-2312-16815Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies
TL;DR
Effective information (EI) — defined as mutual information between a uniformly intervened input distribution and the resulting output distribution of a Markov transition probability matrix — can increase under coarse-graining, a phenomenon Hoel et al. (2013, PNAS 110:19790) termed causal emergence (CE), and this review argues that CE is neither measure-specific nor framework-specific: Comolatti and Hoel showed it appears across Hume's constant conjunction, Cheng's causal attribution, Eells's probability-raising, and Pearl's causal measures alike. The central methodological contribution surveyed is the Neural Information Squeezer Plus (NIS+), an encoder–dynamics-learner–decoder neural architecture that directly maximizes EI using inverse probability weighting and kernel density estimation, and which outperforms the original NIS, variational autoencoders, and feed-forward networks on out-of-distribution data generated by SIR dynamics, the Boid flocking model, and Conway's Game of Life. Applied to two real fMRI datasets — AOMIC ID1000 (movie-viewing) and AOMIC PIOP2 (resting-state, 50 subjects) — NIS+ compressed 100-dimensional micro-state signals to a 1-dimensional macro-state for the task-engaged condition but required 7 dimensions for resting-state, with attribution analysis localizing high-weight micro-signals to visual cortex in the task condition. The paper argues this body of work implies that CE is a principled, data-driven criterion for identifying genuinely causal macro-level representations, that EI maximization suppresses spurious correlations in exactly the way required for out-of-distribution generalization, and that causal emergence identification and causal representation learning are structurally isomorphic problems whose techniques should cross-pollinate.
What to take away
- 1. Causal emergence (CE) is defined as CE = EI(TPM_M) − EI(TPM_m) > 0, where EI is computed by intervening on the input state with a uniform distribution and measuring mutual information with the resulting output distribution, making CE an intrinsic property of the transition probability matrix independent of empirical input data.
- 2. In a four-state boolean network example from Hoel et al. (2013), coarse-graining strategy 1 raised EI from 1.1486 (micro) to 1.55 (macro), while coarse-graining strategy 2 reduced it to 0.18, demonstrating that CE is coarse-graining-sensitive and that not all aggregations yield emergence.
- 3. Comolatti and Hoel showed that CE appears across at least five distinct causation measures — Hume's constant conjunction, Cheng's causal attribution, Eells's probability-raising, Suppes's probability-raising, and Pearl's measures — establishing that CE is not an artifact of EI as a particular metric.
- 4. The Neural Information Squeezer Plus (NIS+) directly maximizes EI using an inverse probability weighting scheme where w(x_t) = p̃(φ(x_t))/p(φ(x_t)), with the denominator estimated via kernel density estimation, converting EI maximization into a tractable neural network training objective.
- 5. Applied to the AOMIC ID1000 fMRI dataset (movie-viewing condition), NIS+ compressed 100-dimensional brain micro-state signals into a single 1-dimensional macro-state, with attribution analysis identifying high-weight contributing signals in visual cortex regions consistent with the task.
- 6. Applied to the AOMIC PIOP2 resting-state fMRI dataset (50 subjects), NIS+ required 7-dimensional macro-states to represent the same 100-dimensional micro-states, and high-weight micro-signals were distributed across multiple brain areas rather than localized, reflecting the absence of a dominant task-driven signal.
- 7. NIS+ outperformed NIS, variational autoencoders, and feed-forward neural networks on out-of-distribution generalization benchmarks across all simulated datasets tested (SIR dynamics with Gaussian noise, Boid model, Game of Life), suggesting that EI maximization selects representations that are invariant to distribution shift.
- 8. The paper raises an open hypothesis that EI maximization serves as a unified, environment-label-free regularizer for out-of-distribution generalization, conjecturing that normalized EI peaks at a representation abstraction level that excludes non-causal spurious features while retaining causal variables — a claim that remains unproven theoretically.
- 9. A replicable methodology choice: NIS and NIS+ use an invertible neural network (INN) for both encoder and decoder, sharing parameters and inverting the computation graph for decoding, with missing dimensions completed by sampling ζ ~ N(0, I_{p−q}); this halves parameter count relative to separate encoder-decoder architectures and is directly reproducible.
- 10. An open problem identified is whether the EI-maximizing coarse-graining solution is unique: multiple coarse-graining strategies can yield identical maximal EI values, reintroducing observer-dependence into what is proposed as an objective criterion, and neither NIS nor NIS+ fully resolves this non-uniqueness despite their prediction-error constraints.
Peer brief — for seminar discussion
Yuan et al. (2023/2024, arXiv:2312.16815) provide a comprehensive survey of quantitative theories of causal emergence (CE) and their intersection with machine learning, covering Hoel et al.'s effective information (EI) framework, Rosas et al.'s partial information decomposition (ϕID) approach, Crutchfield's computational mechanics, and Seth's G-emergence theory, while giving extended treatment to two neural architectures — the Neural Information Squeezer (NIS) and its successor NIS+ — that operationalize CE identification from time-series data. The load-bearing finding is that EI, defined as the mutual information between a uniformly intervened input distribution and the resulting output distribution of a Markov TPM, can increase under coarse-graining (CE = EI(TPM_M) − EI(TPM_m) > 0), and that this phenomenon is measurably detectable from data via NIS+, which directly maximizes EI using inverse probability weighting with kernel density estimation. NIS+ compressed 100-dimensional fMRI micro-states from the AOMIC ID1000 movie-viewing dataset to a 1-dimensional macro-state with attribution weights concentrated in visual cortex, while the AOMIC PIOP2 resting-state dataset (50 subjects) required 7 macro-dimensions with diffusely distributed weights. Across simulated systems — SIR dynamics with added Gaussian noise, the Boid flocking model, and Conway's Game of Life — NIS+ outperformed NIS, variational autoencoders, and feed-forward networks on out-of-distribution generalization. The paper also demonstrates, via Comolatti and Hoel's comparative analysis, that CE is not an artifact of choosing EI: it reappears under Hume's constant conjunction, Cheng's causal attribution, Eells's and Suppes's probability-raising measures, and Pearl's framework. The implied argument is that EI maximization simultaneously identifies emergent macro-structure and produces representations that generalize out-of-distribution, because the do-intervention built into EI removes spurious non-causal correlations in exactly the way required for invariant prediction — functioning as an environment-label-free alternative to invariant risk minimization. The paper names this conjecture explicitly and connects it to stable learning and sample-reweighting approaches in the OOD literature. A critical reader would push back on the epistemological status of the CE measure itself: Dewhurst (Thought, 2021) argues that Hoel's macroscopic causality is an informational explanation rather than evidence of genuinely novel causal power, and this review's own admission that multiple coarse-graining strategies can achieve the same maximal EI — leaving solution non-uniqueness unresolved — means the NIS+ output depends on initialization and architectural choices, undermining the claim that EI maximization yields an observer-independent, ontological criterion for emergence. An alternative method the paper could have used to identify macro-variables without pre-specification is sparse autoencoders or disentangled representation learning, which achieve similar latent compression without the ϕID or EI apparatus and would provide a useful ablation baseline for the generalization claims.
Methods (2)
- Monte Carlo Integration for EITechnique to estimate the continuous EI formula by sampling, used in neural network EI calculation.
- OPTICS AlgorithmDensity-based clustering used within spectral coarse-graining approach.
Findings (13)
- Causal emergence depends on the coarse-graining strategy: different partitions of the same boolean network yield EI values 1.55 (emergence) vs 0.18 (degradation).
Example from Hoel et al. (2013) replicated in the survey.
- Protein interaction networks across >1800 species exhibit macro-scale nodes with lower noise and higher resilience; eukaryotes show stronger CE than archaea.
Klein et al. (2021) analysis of biological interactomes.
- Ant colony task assignment: interactions between foragers show higher noise than nurses/cleaners; CE stabilizes overall colony cohesion.
Swain et al. (2022) EI-based study of ant colonies.
- Biological networks exhibit the lowest EI among real networks and show the most significant causal emergence after coarse-graining.
Finding from Klein & Hoel (2020) on real network analysis.
- NIS+ outperforms NIS, variational autoencoders, and feed-forward neural networks in out-of-distribution generalization experiments.
Yang et al. (2023) result linking EI maximization to robust generalization.
- In AOMIC ID1000 movie-watching fMRI data, NIS+ finds a one-dimensional macro-state representing 100-dimensional micro-states.
Real brain imaging result suggesting a compressed emergent representation.
- EI of ER random networks converges to -log2(p) with increasing size, with a phase transition at average degree ≈ log2(N).
From Klein & Hoel (2020) analysis of artificial complex networks.
- In AOMIC PIOP2 resting-state fMRI data, NIS+ finds a seven-dimensional macro-state with widely distributed attributions.
Contrast to movie-watching condition, showing context-dependent emergence.
- NIS+ captures emergent static/dynamic patterns such as 'gliders' in Conway's Game of Life within the latent space.
Yang et al. (2023) demonstration of emergent pattern recognition.
- NIS+ automatically discovers two-group macro-states in Boid model simulations matching the two boid groups.
Yang et al. (2023) experiment on emergent herding behavior.
Claims (8)
- Causal emergence provides new perspectives for causal representation learning, interpreting latent variables as emergent causalities.
Cross-fertilization claim made in discussion.
- Incorporating machine learning provides objective standards that help mitigate subjectivity in emergence identification.
Authors argue ML optimizers act as objective observers.
- Downward causation is a type of emergent causation that can be quantified by synergistic information in φID.
Rosas's claim endorsed by this survey.
- The NIS and NIS+ frameworks provide effective solutions for causal emergence identification from data.
Central claim of the machine-learning section, summarizing the contribution.
- EI maximization serves as an objective standard for selecting coarse-graining and macro-dynamics.
Claim by Hoel et al. and endorsed by this survey; used to counter subjectivity critiques.
- Causal emergence identification tasks can be understood as causal representation learning tasks.
Authors propose a conceptual mapping between CE identification and CRL.
- EI and normalized EI could serve as a unified metric for out-of-distribution generalization.
Conjecture that maximizing EI yields causal representations invariant to distribution shifts.
- Causal emergence is widespread across measures of causation, not just EI.
Claim by Comolatti & Hoel (2022) endorsed by this survey.
Hypotheses (2)
- If EI maximization is used as a regularization in representation learning, then OOD generalization will improve beyond current invariant risk minimization methods.
Proposed conjecture in §4.3.1.
- There exists a phase transition of emergent causality in complex systems when a key parameter changes.
Speculation in the discussion.
Questions (5)
- How does emergent causality change when the system is changed to adapt to the environment?
Link between emergence and adaptation.
- What coarse-graining strategy generates the maximum effective information?
Central optimization problem in CE identification.
- How does emergent causality have functional effects on the system?
Question about downward causation and mind–body interaction.
- Is causal emergence ontological or epistemological?
Philosophical debate discussed in §5.2.
- When does causal emergence occur?
Open problem stated in §5.4.
Original abstract (expand)
Emergence and causality are two fundamental concepts for understanding complex systems. They are interconnected. On one hand, emergence refers to the phenomenon where macroscopic properties cannot be solely attributed to the cause of individual properties. On the other hand, causality can exhibit emergence, meaning that new causal laws may arise as we increase the level of abstraction. Causal emergence theory aims to bridge these two concepts and even employs measures of causality to quantify emergence. This paper provides a comprehensive review of recent advancements in quantitative theories and applications of causal emergence. Two key problems are addressed: quantifying causal emergence and identifying it in data. Addressing the latter requires the use of machine learning techniques, thus establishing a connection between causal emergence and artificial intelligence. We highlighted that the architectures used for identifying causal emergence are shared by causal representation learning, causal model abstraction, and world model-based reinforcement learning. Consequently, progress in any of these areas can benefit the others. Potential applications and future perspectives are also discussed in the final section of the review.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- Causal Emergence of Consciousness through Learned Multiscale Neural Dynamics in MiceYingqi Rong, Kaiwei Liu, Mingzhe Yang, Jiang Zhang, Jing He Zhipeng Wang2025≈ 87%
- ≈ 87%
- A Multi-Level Causal Intervention Framework for Mechanistic Interpretability in Variational AutoencodersRajiv Misra, Sanjay Kumar Singh, Anisha Roy Dip Roy2026≈ 83%
- Measuring Uncertainty in Transformer Circuits with Effective Information ConsistencyAnatoly A. Krasnovsky2026≈ 83%
- The Birth of Knowledge: Emergent Features across Time, Space, and Scale in Large Language ModelsMicah Adler, Nir Shavit Shashata Sawmya2025≈ 83%
- A macro agent and its actionsFrancesco Massari, Maggie Beheler-Amass and Giulio Tononi Larissa Albantakis2020≈ 83%
- Network algorithmics and the emergence of information integration in cortical modelsValmir C. Barbosa Andre Nathan2011≈ 83%
- Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM UnitsYuzhang Luo, Liangming Pan Jianhui Chen2026≈ 83%
- How causal analysis can reveal autonomy in models of biological systemsHyunju Kim, Sara I. Walker, Giulio Tononi and Larissa Albantakis William Marshall2018≈ 82%
- Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventionsKelsey Allen, Shiry Ginosar, and Alison Gopnik Eunice Yiu2026≈ 82%
- Causal Bayesian Optimization via Exogenous Distribution LearningZihao Wang, Yuzhou Chen, Xiaoning Qian Shaogang Ren2026≈ 82%
- Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViTSai V R Chereddy2026≈ 82%
- Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination MitigationZekai Ye, Xiaocheng Feng, Weihong Zhong, Weitao Ma, Xiachong Feng Qiming Li2025≈ 82%
- Causality $\neq$ Decodability, and Vice Versa: Lessons from Interpreting Counting ViTsYingshan Chang Lianghuan Huang2025≈ 82%
- Integrated information as a common signature of dynamical and information-processing complexityFernando E. Rosas, Juan Carlos Farah, Murray Shanahan, Daniel Bor and Adam B. Barrett Pedro A.M. Mediano2022≈ 82%
- Mechanistic Interpretability as Statistical Estimation: A Variance AnalysisFran\c{c}ois Portet, Maxime Peyrard Maxime M\'eloux2026≈ 81%
- ≈ 81%
- Cognitive glues are shared models of relative scarcities: the economics of collective intelligencein corpus2026≈ 80%
- ≈ 80%
- Collective intelligence: A unifying concept for integrating biology across scales and substratesin corpus2024≈ 80%
- ≈ 80%
- ≈ 80%
- Design for an Individual: Connectionist Approaches to the Evolutionary Transitions in Individualityin corpus2022≈ 80%
- The Machine Consciousness Hypothesisin corpus≈ 80%
- ≈ 79%
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?in corpus2025≈ 79%
- The biogenic approach to cognitionin corpus2005≈ 79%
- Life as we know itin corpus2013≈ 79%
- Taking AI Welfare Seriouslyin corpus2024≈ 79%
- ≈ 79%
Similar preprints — Semantic Scholar
Cross-corpus bridges (1)
same_concept_as · Nomic cosineExternal markdown files that talk about the same concept as this entity.
- aboutblank_kbHow do biological systems extract actionable meaning from ambiguous, incomplete, or corrupted information signals?questions/how-do-biological-systems-extract-actionable-meaning-from.md0.789