concept
active
concept:interference-weights

Interference Weights

Logit weight contributions from a feature that arise due to superposition with other features, not from the feature's own causal role

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • When non-orthogonal features cause logistic regression to identify a suboptimal probe direction
  • Equal Weightingframework0.750
    Baseline MTL approach minimizing sum of task losses with equal weights; suffers from task balancing
  • Task weightconcept0.731
    Coefficient weighting each task loss in the MTL objective.
  • Weight Editingmethod0.722
    Editing network weights to test predictions about circuit function; proposed as falsifiability test for circuit claims
  • Asymmetric transfer after fine-tuning: high-density bases (B10) are more robust.
  • Autoencoder design choice to learn separate encoder and decoder weights, increasing representational capacity by allowing encoder vectors to distinguish similar features
  • Virtual Weightsconcept0.704
    Implicit weights directly connecting any pair of layers computed by multiplying output weights of one layer with input weights of another through the residual stream
  • Weight spaceconcept0.702
    The space of the model's parameter matrices, where VPD operations take place.