method
active
method:ridge-regression-probing

Ridge Regression Probing

Ridge regression fit on top-256 PCs of Gemini embeddings to predict model layer-40 activations and compute residuals

Neighborhood — ranked by edge-count

Concepts (1)

concept

Methods (2)

method
  • Method used to predict model activations from Gemini embeddings and compute residuals for probe construction
  • Method for building 171 emotion probes by generating stories, embedding them, regressing out Gemini embeddings, and averaging residual activations per emotion

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

  • Standard linear probing technique; compared to mass-mean probing for classification accuracy and causal implication
  • Predicting Assistant Axis projections from L2-normalized Qwen 3 0.6B embeddings of user messages via ridge regression
  • Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction
  • Probing Methodsmethod0.743
    Top-down interpretability approach studying linguistic properties at various residual stream stages; contrasted with the paper's bottom-up mechanistic approach
  • Earlier interpretability method applying classifiers to DNN hidden representations; shares complexity-accuracy dilemma with causal abstraction
  • Linear Probingmethod0.723
    Used to evaluate representation quality across VTAB tasks
  • Sparse Probingmethod0.722
    Method from Gurnee et al. 2023 for finding feature directions including individual neuron analysis
  • Method of using base models (no post-training) to observe spontaneous self-referential behaviors without confound of memorized introspection language.