method
active
method:eigenvalue-based-copying-detectionEigenvalue-Based Copying Detection
A summary statistic using positive eigenvalues of the OV circuit matrix to detect copying behavior in attention heads
Neighborhood — ranked by edge-count
Concepts (2)
concept
- Copying MatrixaboutAn OV circuit matrix that maps tokens to increasing the logit of those same tokens; detectable via positive eigenvalues
- Ginibre Matrix Distributionassociated_withA class of random matrices with Gaussian entries used to characterize the baseline eigenvalue distribution of OV circuits at initialization, against which learned positive eigenvalue clustering is compared
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- What is the correct formal definition of a 'copying matrix' that captures all and only the cases we care about?question0.709Open methodological question about summarizing OV matrix behavior; eigenvalues are used as a working but imperfect proxy
- The Geometry of Interaction model shows that simple copying of information between locations suffices for all computation, establishing emergent logic.
- Demonstration that complex logical behavior emerges from simple copy-cat processes through interaction, exemplifying power of dynamics.
- Detection mechanism computing cosine similarity between activation vectors and steering vectors to classify deception
- Hofstadter & Mitchell's analogy-making model illustrating intelligence as abstract mapping.
- Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
- Temperature=0.8 sampled decoding for self-report; reduces collapse moderately but remains discrete and noisy
- Classifier using cosine similarity between activation vectors and steering vectors to detect deception with 89% accuracy