What operation introduces the difficulty boundary between F3 and F4?

Specific sub-question investigated in Appendix B.4 by creating intermediate task variants.

Source paper

extracted_from

Testing the Limits of Truth Directions in LLMs

(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi

Neighborhood — ranked by edge-count

Claims (1)

claim

The need for genuine counting over lists of more than two elements introduces the key limitation of truth directions.
answered_by
Identified as the exact computational operation that breaks truth direction generalization.

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

For simple factual tasks F0-F3, probe directions show a sharp geometric transition in middle layers, with late-layer probes converging to high cosine similarity; A3 and F4-F5 show no clear transition.finding0.748
Geometric evidence for convergence to stable truth directions only for simpler tasks.
Factual tasks F0-F3 reach near-perfect AUROC in early-to-mid layers of Llama-3.1-8B; arithmetic tasks A1-A3 emerge much later; counting tasks F4-F5 emerge late similar to arithmetic.finding0.739
Core empirical finding about layer-dependent truth direction emergence across task types.
The difficulty boundary for truth directions replicates across all four tested models (Llama-3.2-3B, Llama-3.1-8B, Gemma-2-2b, Gemma-2-9b); generalization to F3-F5 remains consistently low regardless of model size or family.finding0.735
Establishes generalizability of the core difficulty-boundary finding across model families.
Factual task hierarchy (F0–F5)framework0.732
A controlled six-level hierarchy of factual tasks increasing in complexity from simple city-location recall to double-counting constraints.
The three forces—cohesion, mismatch, budget—summarize anchoring trajectories.claim0.730
Summary of the decomposition of S.
Task difficulty moderates whether CoT is performative or genuine: easy recall questions show performative CoT, difficult multihop questions show genuine reasoningclaim0.728
Task difficulty as the key variable distinguishing the two modes of CoT identified in the paper
Task Difficultyconcept0.727
The paper identifies task difficulty as a key moderator: easy MMLU questions show performative CoT, hard GPQA-Diamond questions show genuine reasoning
F0-trained probes in layers 4-10 show inverted separation on F1 (AUROC ≈ 0), systematically misclassifying true statements as false.finding0.726
Demonstrates that early-layer probes capture sentence polarity rather than truth.