finding

active

finding:training-on-cities-neg-cities-improves-ood-generalization-especially-on-neg-sp-en-trans

Training on cities+neg_cities improves OOD generalization, especially on neg_sp_en_trans

Training on statements and their negations mitigates non-truth feature interference in probe directions

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Claims (1)

claim

Training probes on statements and their opposites improves generalization by mitigating non-truth features with opposite-sign correlations
supports
Explains why cities+neg_cities and larger_than+smaller_than training sets yield better OOD accuracy

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Why did mass-mean probing with cities+neg_cities training data perform poorly for the 70B model, despite larger_than+smaller_than performing well?question0.766
Open question about scale-dependent asymmetry in training data effects
MM probes trained on larger_than+smaller_than achieve lower NIE than those trained on cities+neg_cities despite higher classification accuracy on sp_en_transfinding0.761
Dissociation between classification accuracy and causal implication; training on opposites does not always help causally
If EI maximization is used as a regularization in representation learning, then OOD generalization will improve beyond current invariant risk minimization methods.hypothesis0.753
Proposed conjecture in §4.3.1.
In LLaMA-2-13B, cities and neg_cities show antipodal alignment in early layers, rotate to orthogonal in middle layers, then eventually align in later layersfinding0.744
Layer-by-layer evolution of truth direction alignment, supporting hierarchical abstraction hypothesis
Emergence of grid-like representations by training recurrent neural networks to perform spatial localization (Cueva & Wei, 2018)concept0.730
RNN model recapitulating grid cells; related work category 4.
There are fewer representations competent for N tasks than M<N tasks, so training more general models should yield fewer possible solutionshypothesis0.726
Selective pressure toward convergence via task generality
NIS+ outperforms NIS, variational autoencoders, and feed-forward neural networks in out-of-distribution generalization experiments.finding0.724
Yang et al. (2023) result linking EI maximization to robust generalization.
What factors determine the generalisation of learned alignment maps beyond training data?question0.722
Open question about the gap between Theorem 1's existence proof and practical learnability