finding

active

finding:inceptionv1-implements-a-four-layer-circuit-for-pose-invariant-dog-head-detection-with-mirrored-left-right-pathways-that-inhibit-each-other-then-unite-exhibiting-xor-like-properties

InceptionV1 implements a four-layer circuit for pose-invariant dog head detection with mirrored left/right pathways that inhibit each other then unite, exhibiting XOR-like properties

Evidence that neural networks learn sophisticated invariance mechanisms through structured circuits rather than loose feature aggregation

Source paper

extracted_from

Zoom In: An Introduction to Circuits

(2020) · Chris Olah · Nick Cammarata · Ludwig Schubert · Gabriel Goh +2

Neighborhood — ranked by edge-count

Claims (1)

claim

Features are connected by weights forming circuits, and these circuits can be rigorously studied and understood as meaningful algorithms.
supports
Second of three speculative claims asserting that subgraphs of neural networks are tractable and meaningful objects of study

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

InceptionV1 spreads car feature from a pure car detector in mixed4c across dog detector neurons in the next layerfinding0.825
Circuit-level evidence that polysemantic neurons arise deliberately through superposition rather than entangled computation
InceptionV1 neuron 4e:55 responds to cat faces, fronts of cars, and cat legs as unrelated stimulifinding0.808
Concrete example of polysemantic neuron demonstrating the challenge to the circuits agenda
Weights between early and full curve detectors in InceptionV1 form a curve of positive weights at tangent positions, with opposing orientations inhibitoryfinding0.760
Demonstrates that meaningful algorithms can be read directly off floating-point weights in a neural network
Pose-Invariant Dog Head Detectorconcept0.757
A high-level feature neuron in InceptionV1 that detects dog heads regardless of orientation, illustrating higher-level understandable features
All 32 attention heads at layer 3 achieve 100% localization accuracy for injections at layer 2 (5-way classification, 20% chance)finding0.753
Striking mechanistic finding that injection creates universally detectable perturbation in residual stream immediately downstream
All induction heads in the two-layer model occupy an extreme corner of high positive QK and OV eigenvalue positivity space relative to non-induction headsfinding0.749
Quantitative verification of the mechanistic theory; both circuits required for the induction algorithm show the predicted copying/matching structure
Single dendritic layer solves XOR-like problems with capacity matching 8-layer deep networks.finding0.746
Evidence from Beniaguev et al. (2021) that individual biological neurons vastly outperform McCulloch-Pitts model; supports hybrid computation claim.
Mean-difference patching in a two-layer ReLU circuit flips the decision to class-A by activating a third hidden unit that is silent for all natural class-A inputsfinding0.741
Synthetic theoretical example showing pernicious divergence via hidden pathway activation