paper
active
2023
4
paper:doi-10-48550-arxiv-2312-16815

Emergence and Causality in Complex Systems: A Survey on Causal Emergence and Related Quantitative Studies

TL;DR

Effective information (EI) — defined as mutual information between a uniformly intervened input distribution and the resulting output distribution of a Markov transition probability matrix — can increase under coarse-graining, a phenomenon Hoel et al. (2013, PNAS 110:19790) termed causal emergence (CE), and this review argues that CE is neither measure-specific nor framework-specific: Comolatti and Hoel showed it appears across Hume's constant conjunction, Cheng's causal attribution, Eells's probability-raising, and Pearl's causal measures alike. The central methodological contribution surveyed is the Neural Information Squeezer Plus (NIS+), an encoder–dynamics-learner–decoder neural architecture that directly maximizes EI using inverse probability weighting and kernel density estimation, and which outperforms the original NIS, variational autoencoders, and feed-forward networks on out-of-distribution data generated by SIR dynamics, the Boid flocking model, and Conway's Game of Life. Applied to two real fMRI datasets — AOMIC ID1000 (movie-viewing) and AOMIC PIOP2 (resting-state, 50 subjects) — NIS+ compressed 100-dimensional micro-state signals to a 1-dimensional macro-state for the task-engaged condition but required 7 dimensions for resting-state, with attribution analysis localizing high-weight micro-signals to visual cortex in the task condition. The paper argues this body of work implies that CE is a principled, data-driven criterion for identifying genuinely causal macro-level representations, that EI maximization suppresses spurious correlations in exactly the way required for out-of-distribution generalization, and that causal emergence identification and causal representation learning are structurally isomorphic problems whose techniques should cross-pollinate.

What to take away

  1. 1. Causal emergence (CE) is defined as CE = EI(TPM_M) − EI(TPM_m) > 0, where EI is computed by intervening on the input state with a uniform distribution and measuring mutual information with the resulting output distribution, making CE an intrinsic property of the transition probability matrix independent of empirical input data.
  2. 2. In a four-state boolean network example from Hoel et al. (2013), coarse-graining strategy 1 raised EI from 1.1486 (micro) to 1.55 (macro), while coarse-graining strategy 2 reduced it to 0.18, demonstrating that CE is coarse-graining-sensitive and that not all aggregations yield emergence.
  3. 3. Comolatti and Hoel showed that CE appears across at least five distinct causation measures — Hume's constant conjunction, Cheng's causal attribution, Eells's probability-raising, Suppes's probability-raising, and Pearl's measures — establishing that CE is not an artifact of EI as a particular metric.
  4. 4. The Neural Information Squeezer Plus (NIS+) directly maximizes EI using an inverse probability weighting scheme where w(x_t) = p̃(φ(x_t))/p(φ(x_t)), with the denominator estimated via kernel density estimation, converting EI maximization into a tractable neural network training objective.
  5. 5. Applied to the AOMIC ID1000 fMRI dataset (movie-viewing condition), NIS+ compressed 100-dimensional brain micro-state signals into a single 1-dimensional macro-state, with attribution analysis identifying high-weight contributing signals in visual cortex regions consistent with the task.
  6. 6. Applied to the AOMIC PIOP2 resting-state fMRI dataset (50 subjects), NIS+ required 7-dimensional macro-states to represent the same 100-dimensional micro-states, and high-weight micro-signals were distributed across multiple brain areas rather than localized, reflecting the absence of a dominant task-driven signal.
  7. 7. NIS+ outperformed NIS, variational autoencoders, and feed-forward neural networks on out-of-distribution generalization benchmarks across all simulated datasets tested (SIR dynamics with Gaussian noise, Boid model, Game of Life), suggesting that EI maximization selects representations that are invariant to distribution shift.
  8. 8. The paper raises an open hypothesis that EI maximization serves as a unified, environment-label-free regularizer for out-of-distribution generalization, conjecturing that normalized EI peaks at a representation abstraction level that excludes non-causal spurious features while retaining causal variables — a claim that remains unproven theoretically.
  9. 9. A replicable methodology choice: NIS and NIS+ use an invertible neural network (INN) for both encoder and decoder, sharing parameters and inverting the computation graph for decoding, with missing dimensions completed by sampling ζ ~ N(0, I_{p−q}); this halves parameter count relative to separate encoder-decoder architectures and is directly reproducible.
  10. 10. An open problem identified is whether the EI-maximizing coarse-graining solution is unique: multiple coarse-graining strategies can yield identical maximal EI values, reintroducing observer-dependence into what is proposed as an objective criterion, and neither NIS nor NIS+ fully resolves this non-uniqueness despite their prediction-error constraints.

Peer brief — for seminar discussion

Yuan et al. (2023/2024, arXiv:2312.16815) provide a comprehensive survey of quantitative theories of causal emergence (CE) and their intersection with machine learning, covering Hoel et al.'s effective information (EI) framework, Rosas et al.'s partial information decomposition (ϕID) approach, Crutchfield's computational mechanics, and Seth's G-emergence theory, while giving extended treatment to two neural architectures — the Neural Information Squeezer (NIS) and its successor NIS+ — that operationalize CE identification from time-series data. The load-bearing finding is that EI, defined as the mutual information between a uniformly intervened input distribution and the resulting output distribution of a Markov TPM, can increase under coarse-graining (CE = EI(TPM_M) − EI(TPM_m) > 0), and that this phenomenon is measurably detectable from data via NIS+, which directly maximizes EI using inverse probability weighting with kernel density estimation. NIS+ compressed 100-dimensional fMRI micro-states from the AOMIC ID1000 movie-viewing dataset to a 1-dimensional macro-state with attribution weights concentrated in visual cortex, while the AOMIC PIOP2 resting-state dataset (50 subjects) required 7 macro-dimensions with diffusely distributed weights. Across simulated systems — SIR dynamics with added Gaussian noise, the Boid flocking model, and Conway's Game of Life — NIS+ outperformed NIS, variational autoencoders, and feed-forward networks on out-of-distribution generalization. The paper also demonstrates, via Comolatti and Hoel's comparative analysis, that CE is not an artifact of choosing EI: it reappears under Hume's constant conjunction, Cheng's causal attribution, Eells's and Suppes's probability-raising measures, and Pearl's framework. The implied argument is that EI maximization simultaneously identifies emergent macro-structure and produces representations that generalize out-of-distribution, because the do-intervention built into EI removes spurious non-causal correlations in exactly the way required for invariant prediction — functioning as an environment-label-free alternative to invariant risk minimization. The paper names this conjecture explicitly and connects it to stable learning and sample-reweighting approaches in the OOD literature. A critical reader would push back on the epistemological status of the CE measure itself: Dewhurst (Thought, 2021) argues that Hoel's macroscopic causality is an informational explanation rather than evidence of genuinely novel causal power, and this review's own admission that multiple coarse-graining strategies can achieve the same maximal EI — leaving solution non-uniqueness unresolved — means the NIS+ output depends on initialization and architectural choices, undermining the claim that EI maximization yields an observer-independent, ontological criterion for emergence. An alternative method the paper could have used to identify macro-variables without pre-specification is sparse autoencoders or disentangled representation learning, which achieve similar latent compression without the ϕID or EI apparatus and would provide a useful ablation baseline for the generalization claims.

Methods (2)

Findings (13)

Questions (5)

Original abstract (expand)

Emergence and causality are two fundamental concepts for understanding complex systems. They are interconnected. On one hand, emergence refers to the phenomenon where macroscopic properties cannot be solely attributed to the cause of individual properties. On the other hand, causality can exhibit emergence, meaning that new causal laws may arise as we increase the level of abstraction. Causal emergence theory aims to bridge these two concepts and even employs measures of causality to quantify emergence. This paper provides a comprehensive review of recent advancements in quantitative theories and applications of causal emergence. Two key problems are addressed: quantifying causal emergence and identifying it in data. Addressing the latter requires the use of machine learning techniques, thus establishing a connection between causal emergence and artificial intelligence. We highlighted that the architectures used for identifying causal emergence are shared by causal representation learning, causal model abstraction, and world model-based reinforcement learning. Consequently, progress in any of these areas can benefit the others. Potential applications and future perspectives are also discussed in the final section of the review.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Similar preprints — Semantic Scholar

Cross-corpus bridges (1)

same_concept_as · Nomic cosine

External markdown files that talk about the same concept as this entity.

  • aboutblank_kb
    How do biological systems extract actionable meaning from ambiguous, incomplete, or corrupted information signals?questions/how-do-biological-systems-extract-actionable-meaning-from.md0.789