finding
active
finding:mm-probes-trained-on-larger-than-smaller-than-achieve-lower-nie-than-those-trained-on-cities-neg-cities-despite-higher-classification-accuracy-on-sp-en-transMM probes trained on larger_than+smaller_than achieve lower NIE than those trained on cities+neg_cities despite higher classification accuracy on sp_en_trans
Dissociation between classification accuracy and causal implication; training on opposites does not always help causally
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Claims (1)
claim
- Explains why cities+neg_cities and larger_than+smaller_than training sets yield better OOD accuracy
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Open question about scale-dependent asymmetry in training data effects
- Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
- Unexplained result pointing to asymmetry in how training on opposites affects truth probes at 70B scale
- Striking cross-domain generalization result supporting the claim that larger models represent abstract truth
- Justifies restricting probe-based vector derivation to h_b activations; attributed to Yes/No semantics
- Shows the key divide is passive vs. active framing, not the specific wording of instructions.
- Core result showing MM is superior to LR for causal implication despite similar classification accuracy
- Shows that truth representations are not reducible to text probability representations