quote
active
quote:decompose-parameters-not-activationsDecompose parameters, not activations
Core slogan encapsulating the paradigm shift of VPD.
Source paper
extracted_from(2026) · Bushnaq, Lucius · Braun, Dan · Clive-Griffin, Oliver · Bussmann, Bart +4
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The conventional approach (e.g., SAEs, transcoders) of decomposing activations into interpretable features.
- Core proposition of the paper: a substrate-level critique of existing interpretability methods.
- Critique of activation-based interpretability methods.
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Key capability: covariance pooling compresses gigabytes of activations into compact stable embeddings without large labeled datasets.
- Emphasizes that anchoring is a binding process without weight updates.
- Motivates shift from studying model activations ('thoughts') to understanding parameters ('the computations themselves').