finding
active
finding:llama-2-70b-displays-summarization-behavior-over-punctuation-tokens-in-a-context-dependent-way-present-for-cities-but-not-for-sp-en-transLLaMA-2-70B displays summarization behavior over punctuation tokens in a context-dependent way: present for cities but not for sp_en_trans
Contrasts with 7B and 13B which show consistent summarization behavior; may complicate localization at 70B scale
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Claims (1)
claim
- Localization result from patching experiments; identifies group (b) hidden states as the locus of truth representations
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Demonstrates that small models represent surface features rather than abstract truth
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.803Central interpretive claim of the paper supported by causal ablation and activation evidence
- Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim
- Model-specific difference in persona susceptibility
- The specific Fourier feature periods identified confirm base-10 rather than modular computation
- The complete mechanistic algorithm discovered for cyclic concept reasoning
- Localizes truth representations to specific hidden states, motivating the rest of the analysis
- Llama-3.1-8B uses base-10 addition rather than modular addition to compute cyclic concept sumsfinding0.772The central empirical finding that computation does not mirror the circular representational structure