Linear Probe Training

Method for fitting a linear classifier on collected activations to predict task-relevant features

Neighborhood — ranked by edge-count

method

Linear Probe
related_to
Simple linear classifiers trained on model activations used as the probing technique within the introduced method.

artifact

pyvene open-source Python library
implements
The main artifact introduced in the paper: an open-source PyPI library for customizable interventions on PyTorch models

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Linear Probingmethod0.847
Used to evaluate representation quality across VTAB tasks
Linear Probe for Evaluation Awarenessmethod0.839
Nguyen et al. trained linear probes on activations to distinguish evaluation from deployment scenarios.
Linear Decodingmethod0.760
Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
Linear probe achieves 100% classification accuracy for almost all components in Pythia-6.9B gender taskfinding0.752
Demonstrates that linear probes can overestimate causal relevance; probes succeed on non-causally-relevant representations
Probes trained under different explicit instruction variants are highly aligned with each other despite different wording.claim0.727
Shows the key divide is passive vs. active framing, not the specific wording of instructions.
linear steeringmethod0.725
Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
linearityconcept0.724
The sequential, continuous order of text, often challenged by diagrammatic branching.
MM probes trained on larger_than+smaller_than achieve lower NIE than those trained on cities+neg_cities despite higher classification accuracy on sp_en_transfinding0.724
Dissociation between classification accuracy and causal implication; training on opposites does not always help causally