method
active
method:linear-probe-trainingLinear Probe Training
Method for fitting a linear classifier on collected activations to predict task-relevant features
Neighborhood — ranked by edge-count
Methods (1)
method
- Linear Proberelated_toSimple linear classifiers trained on model activations used as the probing technique within the introduced method.
Artifacts (1)
artifact
- pyvene open-source Python libraryimplementsThe main artifact introduced in the paper: an open-source PyPI library for customizable interventions on PyTorch models
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Used to evaluate representation quality across VTAB tasks
- Nguyen et al. trained linear probes on activations to distinguish evaluation from deployment scenarios.
- Correlative technique measuring the type of information encoded in distributed representations via linear predictability.
- Linear probe achieves 100% classification accuracy for almost all components in Pythia-6.9B gender taskfinding0.752Demonstrates that linear probes can overestimate causal relevance; probes succeed on non-causally-relevant representations
- Shows the key divide is passive vs. active framing, not the specific wording of instructions.
- Typical approach that adds a scaled steering vector to representations; the paper argues this is mismatched with actual representation geometry.
- The sequential, continuous order of text, often challenged by diagrammatic branching.
- Dissociation between classification accuracy and causal implication; training on opposites does not always help causally