Universality Hypothesis

The hypothesis that analogous features and circuits reliably form across different neural network models and tasks

Neighborhood — ranked by edge-count

Papers (1)

paper

Zoom In: An Introduction to Circuits
introduces

Thinkers (1)

thinker

Chris Olah
introduces
Co-author; provided high-level research guidance, wrote introduction/discussion.

Claims (1)

claim

Analogous features and circuits form across models and tasks.
associated_with
Third of three speculative claims asserting that learned features are not model-specific but represent universal solutions to learning problems

Hypotheses (2)

hypothesis

Different learning systems facing similar computational problems will converge to similar consciousness-like solutions, including potentially biological and artificial systems
supports
Extension of the Universality Hypothesis to consciousness: if consciousness solves a well-defined computational problem, different systems will discover it independently
We tentatively hypothesize that if an artificial system were trained to perform the same tasks leading to consciousness formation in a human infant, the system would exhibit consciousness as well, by analogy with the Universality Hypothesis.
extends
Paper's uncertain extension of mechanistic interpretability universality to consciousness

Findings (2)

finding

Diverse computer vision models trained on visual recognition tasks converge to remarkably similar internal feature representations regardless of architecture, training procedure, or implementation details, closely matching the organization of animal visual cortex
supports
Empirical evidence for the universality hypothesis cited as supporting the possibility of convergent consciousness-like solutions
Olah et al. (2020) found that automatically trained computer vision models, regardless of architecture and training procedure, all arrive at similar functional structures organizing similar features into similar compositional hierarchies, closely resembling the primate visual cortex.
supports
Empirical finding supporting the Universality Hypothesis; extended by the paper to consciousness

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Feature Universalityconcept0.788
Property of features that form consistently across different models trained on the same or similar data, suggesting features are real representational units
Genesis Hypothesisframework0.786
The conjecture that consciousness does not result from the organized mind but creates and maintains complex models of reality; forms at the beginning of mental development
If the universality hypothesis is broadly true, it raises the exciting possibility that artificial neural networks could predict features previously unknown in biological neural networks.claim0.785
Speculative extension of universality to neuroscience, with high-low frequency detectors as a candidate prediction
Superposition Hypothesisframework0.779
Core theoretical framework: neural networks represent more features than neurons by encoding features as directions in superposition
If we translate the Universality Hypothesis to the problem of consciousness, would it follow that an artificial system trained to perform the same tasks leading to consciousness formation in a human infant would exhibit consciousness as well?question0.767
Key open question linking mechanistic interpretability universality to machine consciousness
Universal Darwinism As Bayesian Inferenceframework0.759
Reward Hypothesisconcept0.759
The claim in RL that any goal can be expressed as maximizing the expected cumulative sum of a scalar reward signal.
I-hypothesisframework0.758
The overarching hypothesis that an I or self-like ground underlies matter and becomes visible in living things.