method
active
method:ridge-regression-probe-construction

Ridge regression probe construction

Method used to predict model activations from Gemini embeddings and compute residuals for probe construction

Neighborhood — ranked by edge-count

Methods (2)

method

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Predicting Assistant Axis projections from L2-normalized Qwen 3 0.6B embeddings of user messages via ridge regression
  • Standard linear probing technique; compared to mass-mean probing for classification accuracy and causal implication
  • Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction
  • Method for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion
  • Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.
  • Probe method combining causal interventions and structural analysis, supported by pyvene's activation collection
  • Method of using base models (no post-training) to observe spontaneous self-referential behaviors without confound of memorized introspection language.