finding
active
finding:mm-probe-trained-on-likely-dataset-achieves-nie-of-0-70-false-true-on-llama-2-13b-surprisingly-strong-but-weaker-than-truth-probesMM probe trained on likely dataset achieves NIE of 0.70 (false→true) on LLaMA-2-13B, surprisingly strong but weaker than truth probes
Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Claims (3)
claim
- LLMs linearly represent truth-relevant information beyond the plausibility of text, as evidenced by probes trained on likely performing poorly on anti-correlated datasetsassociated_withsupportsEstablishes that the observed linear structure is not merely a representation of text probability
- Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence
- Motivates the introduction of mass-mean probing as an alternative to LR
Questions (1)
question
- Open question raised in §7.1 about an unexplained anomalous result
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Larger models linearly represent more general concepts including truth
- Dissociation between classification accuracy and causal implication; training on opposites does not always help causally
- Striking cross-domain generalization result supporting the claim that larger models represent abstract truth
- Shows that truth representations are not reducible to text probability representations
- Shows behavioral pattern of self-correction is trainable in smaller models
- Establishes generalizability of the core difficulty-boundary finding across model families.
- Core result showing MM is superior to LR for causal implication despite similar classification accuracy
- Probe validation result confirming interest direction captures meaningful structure