Residual Activation Vectors

Layer-40 activations with the component explained by compressed Gemini embeddings subtracted, isolating information not driven by surface text content

Neighborhood — ranked by edge-count

Methods (1)

method

Emotion Probe Construction Method
implements
Method for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Residual Stream Activationconcept0.805
The intermediate representations in transformer layers whose activations are patched and probed for truth information
Residual Stream Activation Patchingmethod0.772
Used to localize causally implicated hidden states by swapping activations between true and false inputs
Concept Activation Vectors (TCAVs)method0.759
Kim et al. 2018 method for identifying concept directions in CNN activations; precursor to LLM probing
layer 40 residual-stream activationsconcept0.746
The specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
Residual Streamconcept0.734
Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
Random vector baselinemethod0.733
Baseline method sampling a random vector as feature direction for comparison with learned methods
Activationsconcept0.732
Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
Emotion probes (171-emotion residual vector probes)method0.729
Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed