finding

active

finding:llama-2-70b-displays-summarization-behavior-over-punctuation-tokens-in-a-context-dependent-way-present-for-cities-but-not-for-sp-en-trans

LLaMA-2-70B displays summarization behavior over punctuation tokens in a context-dependent way: present for cities but not for sp_en_trans

Contrasts with 7B and 13B which show consistent summarization behavior; may complicate localization at 70B scale

Source paper

extracted_from

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

(2023) · Samuel Marks · Max Tegmark

Neighborhood — ranked by edge-count

Claims (1)

claim

A small group of causally-implicated hidden states encodes LLM truth representations, localized over clause-ending punctuation tokens
contradicts
Localization result from patching experiments; identifies group (b) hidden states as the locus of truth representations

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

LLaMA-2-7B representations of larger_than+smaller_than cluster by surface-level characteristics such as presence of token 'eighty'finding0.803
Demonstrates that small models represent surface features rather than abstract truth
Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.803
Central interpretive claim of the paper supported by causal ablation and activation evidence
In LLaMA-2-7B, PCA of larger_than+smaller_than shows statements clustering by surface-level characteristics (e.g., presence of token 'eighty') rather than truth valuefinding0.789
Shows absence of abstract truth representations in smallest model, supporting scale-dependent emergence claim
Llama 3.3 70B is the most likely to take on a non-Assistant persona when steered, with even split between human and nonhuman portrayalsfinding0.785
Model-specific difference in persona susceptibility
Llama-3.1-8B uses task-agnostic Fourier features with periods 2, 5, and 10 (base-10) rather than concept-specific periods (e.g., 12 for months)finding0.785
The specific Fourier feature periods identified confirm base-10 rather than modular computation
Llama-3.1-8B implements a two-stage algorithm: (1) compute integer sum via base-10 addition (e.g., six + August = 14), then (2) map sum to cyclic concept space (14 → February)finding0.777
The complete mechanistic algorithm discovered for cyclic concept reasoning
Patching group (b) hidden states (over clause-ending punctuation, early-middle layers) in LLaMA-2-13B produces the strongest causal effect on TRUE/FALSE output predictionsfinding0.775
Localizes truth representations to specific hidden states, motivating the rest of the analysis
Llama-3.1-8B uses base-10 addition rather than modular addition to compute cyclic concept sumsfinding0.772
The central empirical finding that computation does not mirror the circular representational structure