concept
active
concept:residual-activation-vectorsResidual Activation Vectors
Layer-40 activations with the component explained by compressed Gemini embeddings subtracted, isolating information not driven by surface text content
Neighborhood — ranked by edge-count
Methods (1)
method
- Emotion Probe Construction MethodimplementsMethod for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- The intermediate representations in transformer layers whose activations are patched and probed for truth information
- Used to localize causally implicated hidden states by swapping activations between true and false inputs
- Kim et al. 2018 method for identifying concept directions in CNN activations; precursor to LLM probing
- The specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
- Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
- Baseline method sampling a random vector as feature direction for comparison with learned methods
- Internal representations of the model on which probes operate; the method uses activations to rank datapoints.
- Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed