method
active
method:activation-correlation

Activation Correlation

Pearson correlation of feature activations across 40M tokens used to measure feature similarity and universality across models

Neighborhood — ranked by edge-count

Methods (1)

method
  • Log-likelihood ratio score estimating whether a token string belongs to a specific context (Arabic, DNA, base64); used to measure feature specificity and sensitivity

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Model-independent feature comparison based on correlating activation vectors across a fixed diverse dataset
  • Activationsconcept0.796
    Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
  • Intervention method that adds a learned direction vector to residual stream activations to steer model behavior
  • Latent model activations when processing inputs framed from another agent's perspective
  • Key capability: covariance pooling compresses gigabytes of activations into compact stable embeddings without large labeled datasets.
  • Component of the contrastive retrieval pipeline analyzing activation statistics.
  • Intervening in model forward pass by adding/subtracting probe direction to group (b) hidden states to flip truth judgments
  • Standard method in mechanistic interpretability that intervenes on activations; VPD flips this paradigm by patching parameters.