Interference Weights

Logit weight contributions from a feature that arise due to superposition with other features, not from the feature's own causal role

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Feature Interferenceconcept0.781
When non-orthogonal features cause logistic regression to identify a suboptimal probe direction
Equal Weightingframework0.750
Baseline MTL approach minimizing sum of task losses with equal weights; suffers from task balancing
Task weightconcept0.731
Coefficient weighting each task loss in the MTL objective.
Weight Editingmethod0.722
Editing network weights to test predictions about circuit function; proposed as falsifiability test for circuit claims
cross-base interferenceconcept0.716
Asymmetric transfer after fine-tuning: high-density bases (B10) are more robust.
Untied Decoder Weightsconcept0.715
Autoencoder design choice to learn separate encoder and decoder weights, increasing representational capacity by allowing encoder vectors to distinguish similar features
Virtual Weightsconcept0.704
Implicit weights directly connecting any pair of layers computed by multiplying output weights of one layer with input weights of another through the residual stream
Weight spaceconcept0.702
The space of the model's parameter matrices, where VPD operations take place.