claim
active
claim:the-generalization-improvement-from-explicit-instructions-observed-in-llama-models-a1-a3-to-f0-f2-is-more-pronounced-for-f3-f5-to-f0-f2-in-gemma-modelsThe generalization improvement from explicit instructions observed in Llama models (A1-A3 to F0-F2) is more pronounced for F3-F5 to F0-F2 in Gemma models.
Shows the instruction effect, while shifting geometry, may not produce consistent generalization effects across model families.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Claims (1)
claim
- Finding that explicit correctness framing partially aligns truth directions across task families.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Establishes generalizability of the core difficulty-boundary finding across model families.
- Within-family factual generalization (F0-F2) is consistently strong across all models and prompt settings.finding0.798Establishes a reliable baseline for factual truth direction universality within simple factual recall.
- Core empirical finding about layer-dependent truth direction emergence across task types.
- Paper describing Gemma 2 model family used in this study
- Shows behavioral pattern of self-correction is trainable in smaller models
- Key limitation acknowledged by authors.
- Truth-related directions reliably emerge at 60–75% of normalized layer depth in Qwen and Gemma modelsfinding0.777Experiment 1 finding localizing where truth can be causally mediated
- Larger models linearly represent more general concepts including truth