finding
active
finding:no-collisions-found-in-1-280-000-randomly-sampled-inputs-through-trained-mlp-in-hierarchical-equality-task-across-10-random-seedsNo collisions found in 1,280,000 randomly sampled inputs through trained MLP in hierarchical equality task across 10 random seeds
Empirical support for input-injectivity assumption holding in practice
Source paper
extracted_from(2025) · Sutter, Denis · Minder, Julian · Hofmann, Thomas · Pimentel, Tiago
Neighborhood — ranked by edge-count
Concepts (1)
concept
- Input-InjectivitysupportsAssumption that DNN layers preserve input information by being injective; key condition for Theorem 1
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates that high IIA can be obtained even when model cannot solve the task
- Claim about the sparsity and sufficiency of the identified neuron set
- A sparse set of 28 MLP neurons at layer 18 (~0.2% of MLP) are reused across all cyclic tasksfinding0.734Quantitative finding identifying the specific neurons responsible for generic addition
- Explains why RevNet lacks capacity to separate states for identity-of-first-argument algorithm
- SAE features are not simply mirroring individual neurons.
- Robustness check ruling out that any perturbation would decrease type hint rate due to brittleness.
- Selective pressure toward convergence via task generality
- Experiment 3 comparison: zero-shot control shows lower semantic convergence than experimental condition