concept
active
concept:copying-matrixCopying Matrix
An OV circuit matrix that maps tokens to increasing the logit of those same tokens; detectable via positive eigenvalues
Neighborhood — ranked by edge-count
Methods (1)
method
- A summary statistic using positive eigenvalues of the OV circuit matrix to detect copying behavior in attention heads
Concepts (1)
concept
- OV Circuitassociated_withThe circuit formed by W_U W_OV^h W_E that describes how a given token affects output logits if attended to
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- What is the correct formal definition of a 'copying matrix' that captures all and only the cases we care about?question0.821Open methodological question about summarizing OV matrix behavior; eigenvalues are used as a working but imperfect proxy
- The simple matrix form into which VPD constrains subcomponents to enforce mechanistic simplicity.
- Prescribes transitions among hidden state factors under action; encodes policy-dependent dynamics
- Key mathematical object in DAS; transforms neural representations to alternative bases to reveal distributed structure.
- High-dimensional array mapping hidden states to outcomes; key learned quantity in the paradigm
- Hofstadter & Mitchell's analogy-making model illustrating intelligence as abstract mapping.
- Constraint in VPD where each parameter subcomponent is constrained to be a rank-one matrix for simplicity.
- The core idea of decomposing weight matrices into components for interpretability.