Feature Universality

Property of features that form consistently across different models trained on the same or similar data, suggesting features are real representational units

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Feature universality across independently trained models suggests features have some existence beyond individual modelsclaim0.819
Authors take agnostic position on ontological status but universality evidence pushes toward features being real
Universality Hypothesisconcept0.788
The hypothesis that analogous features and circuits reliably form across different neural network models and tasks
Feature Sparsityconcept0.779
Property that features activate on only a small fraction of inputs; enables compressed sensing and is what allows superposition to work
Uniquenessconcept0.764
The property that every place generated by a living process is inevitably unique due to its adaptation to specific conditions.
Is the apparent universality of some low-level vision features the exception or the rule?question0.764
Open empirical question following anecdotal cross-model universality findings
Feature splittingconcept0.762
Phenomenon where a feature in a small SAE splits into multiple finer features in a larger SAE.
Truth direction universalityconcept0.761
The claim that truth directions are consistent and generalizable across layers, tasks, and prompt formats in LLMs.
Pure Featureconcept0.753
A feature that responds to only a single latent variable, contrasted with polysemantic features