concept
active
concept:layer-40-residual-stream-activationslayer 40 residual-stream activations
The specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
Neighborhood — ranked by edge-count
Methods (3)
method
- Emotion probes (171-emotion residual vector probes)associated_withLinear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed
- Ridge regression fit on top-256 PCs of Gemini embeddings to predict model layer-40 activations and compute residuals
- SAEs trained on 100M+ tokens to compress token layer-40 activations into 64 active features out of 100K+ for interpretability analysis
Concepts (1)
concept
- Residual Stream Activationrelated_toThe intermediate representations in transformer layers whose activations are patched and probed for truth information
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Used to localize causally implicated hidden states by swapping activations between true and false inputs
- Technique to localize causally implicated hidden states by swapping residual stream activations between a true and false input and measuring downstream log-probability changes
- Proposed pathway flowing through layers at each position; calculates K/V values that feed horizontal information flow.
- The finite dimensional capacity of the residual stream for storing and communicating information between layers; conceptualized as being under high demand
- Core activation intervention: add scaled vector to residual stream at layer l during completion
- Architectural observation enabling the entire mathematical framework; the residual stream is purely a sum of linear projections
- Tracks cosine similarity, norm ratio, and injection direction projection across layers to measure recovery from perturbation
- Layer-40 activations with the component explained by compressed Gemini embeddings subtracted, isolating information not driven by surface text content