finding
active
finding:mass-mean-probes-generalize-about-as-well-as-lr-and-ccs-for-llama-2-13b-and-70bMass-mean probes generalize about as well as LR and CCS for LLaMA-2-13B and 70B
Despite being simpler and optimization-free, MM probes match accuracy of other techniques at scale
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Claims (1)
claim
- Key methodological claim: MM probes are both competitive in accuracy and superior in causal influence
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Larger models linearly represent more general concepts including truth
- Striking cross-domain generalization result supporting the claim that larger models represent abstract truth
- Supporting finding showing ESR is driven by both higher multi-attempt rates and comparable improvement rates
- Demonstrates that small models represent surface features rather than abstract truth
- Core result showing MM is superior to LR for causal implication despite similar classification accuracy
- Cross-judge validation of the primary ESR finding across OpenAI, Alibaba, Anthropic, and Google judge models
- Model-specific difference in persona susceptibility
- Illustrative finding that ESR mitigates but does not fully eliminate steering influence