Analogous features and circuits form across models and tasks.

Third of three speculative claims asserting that learned features are not model-specific but represent universal solutions to learning problems

Source paper

extracted_from

Zoom In: An Introduction to Circuits

(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2

Neighborhood — ranked by edge-count

Papers (1)

paper

Zoom In: An Introduction to Circuits
introduces

Findings (2)

finding

Curve detectors found across AlexNet, InceptionV1, VGG19, ResNetV2-50 and models trained on Places365
supports
Anecdotal evidence for the universality of low-level visual features across different architectures and datasets
High-low frequency detectors found across AlexNet, InceptionV1, VGG19, and ResNetV2-50
supports
Second low-level feature type demonstrating cross-architecture universality

Hypotheses (1)

hypothesis

We hypothesize that high-low frequency detectors, if predicted by artificial neural network universality, might be found in biological neural networks.
associated_with
Specific cross-domain prediction mentioned by neuroscientists in conversation with the authors

Concepts (1)

concept

Universality Hypothesis
associated_with
The hypothesis that analogous features and circuits reliably form across different neural network models and tasks

Claims (2)

claim

Features are connected by weights forming circuits, and these circuits can be rigorously studied and understood as meaningful algorithms.
supports
Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study
If the universality hypothesis is broadly true, it raises the exciting possibility that artificial neural networks could predict features previously unknown in biological neural networks.
extends
Speculative extension of universality to neuroscience, with high-low frequency detectors as a candidate prediction

Questions (2)

question

Lack of rigorous cross-model comparison demonstrating that specific named features (not just correlated ones) form across architectures
gates
Explicitly identified research gap: anecdotal evidence exists but rigorous characterization is absent
Is the apparent universality of some low-level vision features the exception or the rule?
gates
Open empirical question following anecdotal cross-model universality findings

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Circuits could act as an epistemic foundation for interpretability by breaking down model behavior into falsifiable statements about small subgraphs.claim0.777
Normative vision for how the circuits agenda could resolve the pre-paradigmatic state of interpretability
The circuits used for modeling fictional characters overlap with the model's self-model, but the character you're talking to is represented using different mechanisms than fictional character representation.claim0.776
Refinement of character-circuit overlap, stressing that self-character is not just another fiction character.
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models (Marks et al., 2025)concept0.775
Cited as enabling precise behavioral control through SAE features, extending the same methodological line
How do representations differ or converge between architectures, tasks, and modalities?question0.771
Broader research question MAS is positioned to address, citing multiple recent works.
Models that are competent all represent data in a similar way; all strong models are alike, each weak model is weak in its own wayclaim0.768
Author's interpretation of the VTAB alignment results echoing Tolstoy
Interpretability features converge across different model architectures, revealing structural similarities.claim0.766
The features are often organized in geometrically-related clusters that share a semantic relationship.claim0.764
Decoder cosine similarity maps onto concept similarity.
The strengths of all three models (concurrent OOP, logic, functional) are irrelevant to parallelism, and generally unhelpful in dealing with process creation and coordination.claim0.757
Assertion that the popular models add nothing to parallel programming.