concept
active
concept:residual-activation-vectors

Residual Activation Vectors

Layer-40 activations with the component explained by compressed Gemini embeddings subtracted, isolating information not driven by surface text content

Neighborhood — ranked by edge-count

Methods (1)

method
  • Method for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • The intermediate representations in transformer layers whose activations are patched and probed for truth information
  • Used to localize causally implicated hidden states by swapping activations between true and false inputs
  • Kim et al. 2018 method for identifying concept directions in CNN activations; precursor to LLM probing
  • The specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
  • Residual Streamconcept0.734
    Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
  • Baseline method sampling a random vector as feature direction for comparison with learned methods
  • Activationsconcept0.732
    Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
  • Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed