hypothesis
active
hypothesis:we-hypothesize-that-llms-represent-correctness-of-arithmetic-expressions-differently-from-factual-statementsWe hypothesize that LLMs represent correctness of arithmetic expressions differently from factual statements.
Core working hypothesis motivating the factual vs. arithmetic task split in the experimental design.
Source paper
extracted_from(2026) · Angelos Poulis · Mark Crovella · Evimaria Terzi
Neighborhood — ranked by edge-count
Papers (1)
paper
Findings (1)
finding
- Core empirical finding about layer-dependent truth direction emergence across task types.
Claims (1)
claim
- Truth directions emerge in earlier layers for factual tasks and later layers for arithmetic tasks.supportsCore empirical claim about the layer-dependence of truth direction emergence as a function of task type.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Establishes that the observed linear structure is not merely a representation of text probability
- Motivating claim supported by the CAPTCHA example and Perez et al. (2022) findings
- Claude 3 Opus ratings aligned with human judgment of feature descriptions.
- Clarification to avoid misinterpretation.
- Interpretive claim connecting scale to abstraction level in LLM representations
- Central empirical conclusion of the paper about the fundamental limits of truth directions.
- Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
- Core cross-modal empirical result: larger and better language models align better with vision models