method
active
method:eigenvalue-based-copying-detection

Eigenvalue-Based Copying Detection

A summary statistic using positive eigenvalues of the OV circuit matrix to detect copying behavior in attention heads

Neighborhood — ranked by edge-count

Concepts (2)

concept
  • An OV circuit matrix that maps tokens to increasing the logit of those same tokens; detectable via positive eigenvalues
  • A class of random matrices with Gaussian entries used to characterize the baseline eigenvalue distribution of OV circuits at initialization, against which learned positive eigenvalue clustering is compared

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.