Attribution Similarity

Correlating attribution vectors (feature activation × logit weight of next token) across model pairs to measure functional universality

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Data Attributionconcept0.850
The task of attributing model behaviors to specific training datapoints.
Attribution Graphsmethod0.805
Gradient-based technique using SAE features to estimate causal effects on completions; used to corroborate NLA findings.
Attribution patchingmethod0.803
Gradient-based method to estimate the effect of zeroing a feature on a specific logit difference.
Self-Similarityconcept0.796
Structural and functional property exhibited by living systems but currently absent from most engineered machines.
Activation Similarityconcept0.787
Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset
Gradient-based data attributionmethod0.779
Baseline method against which probe-based ranking is compared; more computationally expensive.
Functional Similarityconcept0.775
Similarity measured with respect to network behavior/function rather than statistical correlation of activations.
What is the quality they have in common?question0.769
Question asked about the six big projects to identify shared features of living process buildings.