method
active
method:ridge-regression-probe-constructionRidge regression probe construction
Method used to predict model activations from Gemini embeddings and compute residuals for probe construction
Neighborhood — ranked by edge-count
Methods (2)
method
- Ridge Regression Probingrelated_toRidge regression fit on top-256 PCs of Gemini embeddings to predict model layer-40 activations and compute residuals
- Linear probes constructed to measure 171 emotion concepts in model activations with surface semantic content removed
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Predicting Assistant Axis projections from L2-normalized Qwen 3 0.6B embeddings of user messages via ridge regression
- Standard linear probing technique; compared to mass-mean probing for classification accuracy and causal implication
- Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction
- Method for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion
- Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.
- Probe method combining causal interventions and structural analysis, supported by pyvene's activation collection
- Method of using base models (no post-training) to observe spontaneous self-referential behaviors without confound of memorized introspection language.