Eigenvalue-Based Copying Detection

A summary statistic using positive eigenvalues of the OV circuit matrix to detect copying behavior in attention heads

Neighborhood — ranked by edge-count

concept

Copying Matrix
about
An OV circuit matrix that maps tokens to increasing the logit of those same tokens; detectable via positive eigenvalues
Ginibre Matrix Distribution
associated_with
A class of random matrices with Gaussian entries used to characterize the baseline eigenvalue distribution of OV circuits at initialization, against which learned positive eigenvalue clustering is compared

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

What is the correct formal definition of a 'copying matrix' that captures all and only the cases we care about?question0.709
Open methodological question about summarizing OV matrix behavior; eigenvalues are used as a working but imperfect proxy
Mere copying processes (partial involutions) are computationally universalclaim0.697
The Geometry of Interaction model shows that simple copying of information between locations suffices for all computation, establishing emergent logic.
Logical emergence from copyingconcept0.696
Demonstration that complex logical behavior emerges from simple copy-cat processes through interaction, exemplifying power of dynamics.
Cosine Similarity-Based Deception Detectionconcept0.694
Detection mechanism computing cosine similarity between activation vectors and steering vectors to classify deception
Copycat systemmethod0.688
Hofstadter & Mitchell's analogy-making model illustrating intelligence as abstract mapping.
Linear Decodingmethod0.686
Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
Sampled-decoding self-reportmethod0.683
Temperature=0.8 sampled decoding for self-report; reduces collapse moderately but remains discrete and noisy
Cosine Similarity Binary Classifiermethod0.678
Classifier using cosine similarity between activation vectors and steering vectors to detect deception with 89% accuracy