Logistic Regression Probe

Standard linear probing technique; compared to mass-mean probing for classification accuracy and causal implication

Neighborhood — ranked by edge-count

paper

framework

Mass-Mean Probing
extends
Introduced in this paper: an optimization-free probing technique using difference-in-means direction with optional covariance correction

concept

Maximum Margin Separator
implements
The direction logistic regression converges to on linearly separable data; shown to be suboptimal for identifying truth direction

method

Logistic regression correctness probe
related_to
Logistic regression trained on GSM8k training set to predict answer correctness from projection features along reflection direction

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Ridge Regression Probingmethod0.800
Ridge regression fit on top-256 PCs of Gemini embeddings to predict model layer-40 activations and compute residuals
Ridge regression probe constructionmethod0.768
Method used to predict model activations from Gemini embeddings and compute residuals for probe construction
Probe-Based Data Attributionmethod0.767
Linear classifier approach applied to model activations to identify which training datapoints caused undesired behaviors in post-training.
Probesconcept0.753
Interpretability tools that decode information from internal model activations; here, linear probes are used for data attribution.
logistic surrogate modelmethod0.750
Sigmoid fit linking S to success probability.
Causal Structural Probemethod0.748
Probe method combining causal interventions and structural analysis, supported by pyvene's activation collection
Logistic surrogate fittingmethod0.748
Fitting a logistic function to success probability as a function of S or shot count to estimate midpoints and widths.
Linear Probemethod0.744
Simple linear classifiers trained on model activations used as the probing technique within the introduced method.