claim
active
claim:feature-universality-across-independently-trained-models-suggests-features-have-some-existence-beyond-individual-modelsFeature universality across independently trained models suggests features have some existence beyond individual models
Authors take agnostic position on ontological status but universality evidence pushes toward features being real
Source paper
extracted_from(2024) · Marc Carauleanu · Michael Vaiana · Judd Rosenblatt · Cameron Berg +1
Neighborhood — ranked by edge-count
Findings (4)
finding
- Features in A/1 have median activation correlation of 0.72 with most similar feature in B/1; neurons have median 0.46associated_withsupportsSystematic comparison showing features are substantially more universal than neurons across models
- Demonstrates universality of the Arabic script feature across two independently trained transformers
- Universality of base64 feature across two transformers
- Universality of DNA feature across two transformer models with different random seeds
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Property of features that form consistently across different models trained on the same or similar data, suggesting features are real representational units
- Features may not be strictly one-dimensional objects; higher-dimensional feature manifolds may exist in model representationshypothesis0.780Extension of superposition hypothesis to account for continuous families of features
- Speculative extension of universality to neuroscience, with high-low frequency detectors as a candidate prediction
- Explicitly identified research gap: anecdotal evidence exists but rigorous characterization is absent
- Motivation for using sparsity-based dictionary learning on language models
- The central hypothesis of the paper; the platonic representation hypothesis itself
- Author's interpretation of the VTAB alignment results echoing Tolstoy