finding
active
finding:the-performance-drop-in-factual-tasks-happens-as-soon-as-list-length-increases-to-3-with-very-little-additional-degradation-from-4-to-5-citiesThe performance drop in factual tasks happens as soon as list length increases to 3, with very little additional degradation from 4 to 5 cities.
Pinpoints list-length 3 as the exact boundary where genuine counting introduces the limitation.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Claims (1)
claim
- Identified as the exact computational operation that breaks truth direction generalization.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- We hypothesize that degraded generalization on benchmarks like MMLU may reflect the computational demands of the tasks.hypothesis0.774Connecting the paper's task-difficulty findings to prior observations of weak generalization on complex QA benchmarks.
- Establishes F3-F5 as a hard generalization boundary that instructions cannot overcome.
- Core empirical finding about layer-dependent truth direction emergence across task types.
- Quantitative intuition to justify radical skepticism toward early ideas.
- Explanation for the 'silent' thought phenomenon.
- A combinatorial argument that good sequences are astronomically rare, emphasizing the difficulty of discovery.
- Suggests that later models can keep the thought 'silent' rather than letting it influence output.