method
active
method:ridge-regression-probingRidge Regression Probing
Ridge regression fit on top-256 PCs of Gemini embeddings to predict model layer-40 activations and compute residuals
Neighborhood — ranked by edge-count
Concepts (1)
concept
- The specific neural network layer from which activations are extracted for probe construction and SAE training in the target models
Methods (2)
method
- Ridge regression probe constructionrelated_toMethod used to predict model activations from Gemini embeddings and compute residuals for probe construction
- Method for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Standard linear probing technique; compared to mass-mean probing for classification accuracy and causal implication
- Predicting Assistant Axis projections from L2-normalized Qwen 3 0.6B embeddings of user messages via ridge regression
- Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction
- Top-down interpretability approach studying linguistic properties at various residual stream stages; contrasted with the paper's bottom-up mechanistic approach
- Earlier interpretability method applying classifiers to DNN hidden representations; shares complexity-accuracy dilemma with causal abstraction
- Used to evaluate representation quality across VTAB tasks
- Method from Gurnee et al. 2023 for finding feature directions including individual neuron analysis
- Method of using base models (no post-training) to observe spontaneous self-referential behaviors without confound of memorized introspection language.