finding
active
finding:linear-probe-achieves-100-classification-accuracy-for-almost-all-components-in-pythia-6-9b-gender-taskLinear probe achieves 100% classification accuracy for almost all components in Pythia-6.9B gender task
Demonstrates that linear probes can overestimate causal relevance; probes succeed on non-causally-relevant representations
Source paper
extracted_from(2024) · Zhengxuan Wu · Atticus Geiger · Aryaman Arora · Jing Huang +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Key interpretive claim from Case Study II distinguishing probe accuracy from causal relevance
Findings (1)
finding
- Case Study II result showing DAS identifies fewer causally relevant positions than a probe
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Probe achieves selectivity of 4.20 on pythia-410m, slightly exceeding DAS selectivity of 3.96finding0.808Key result showing that for models larger than pythia-70m, probe selectivity matches or exceeds DAS selectivity
- Baseline result confirming the model has fully learned the gender prediction task before probing
- Baseline accuracy showing small models fail on harder NPI licensing tasks
- Nguyen et al. trained linear probes on activations to distinguish evaluation from deployment scenarios.
- Attributed to model anisotropy from saturation making hidden states harder to access
- Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
- Justifies restricting probe-based vector derivation to h_b activations; attributed to Yes/No semantics
- Linear alignment map ϕ_lin IIA tracks DNN accuracy during Pythia-410m training progression on IOI taskfinding0.760Suggests linear maps may be better correlated with genuine task implementation than non-linear maps