question

active

question:why-did-mass-mean-probing-with-cities-neg-cities-training-data-perform-poorly-for-the-70b-model-despite-larger-than-smaller-than-performing-well

Why did mass-mean probing with cities+neg_cities training data perform poorly for the 70B model, despite larger_than+smaller_than performing well?

Open question about scale-dependent asymmetry in training data effects

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Papers (1)

paper

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
associated_with

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Why did mass-mean probing with cities+neg_cities perform poorly for the 70B model, despite mass-mean probing with larger_than+smaller_than performing well?question0.958
Unexplained result pointing to asymmetry in how training on opposites affects truth probes at 70B scale
MM probes trained on larger_than+smaller_than achieve lower NIE than those trained on cities+neg_cities despite higher classification accuracy on sp_en_transfinding0.841
Dissociation between classification accuracy and causal implication; training on opposites does not always help causally
Why were interventions with mass-mean probe directions extracted from the likely dataset so effective, despite these probes not being accurate at classifying true/false statements?question0.800
Open question raised in §7.1 about an unexplained anomalous result
Probes trained on the likely dataset perform worse than chance on datasets with anti-correlations between text probability and truthfinding0.777
Shows that truth representations are not reducible to text probability representations
MM probe trained on likely dataset achieves NIE of 0.70 (false→true) on LLaMA-2-13B, surprisingly strong but weaker than truth probesfinding0.770
Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
There are fewer representations competent for N tasks than M<N tasks, so training more general models should yield fewer possible solutionshypothesis0.768
Selective pressure toward convergence via task generality
Mass-mean probe directions outperform LR and CCS in causal intervention experiments (NIE) in 7/8 experimental conditionsfinding0.767
Core result showing MM is superior to LR for causal implication despite similar classification accuracy
Training on cities+neg_cities improves OOD generalization, especially on neg_sp_en_transfinding0.766
Training on statements and their negations mitigates non-truth feature interference in probe directions