claim
active
claim:llms-sometimes-know-statements-are-false-but-generate-them-anyway-motivating-the-need-for-techniques-that-inspect-internal-model-state-rather-than-outputs-aloneLLMs sometimes know statements are false but generate them anyway, motivating the need for techniques that inspect internal model state rather than outputs alone
Motivating claim supported by the CAPTCHA example and Perez et al. (2022) findings
Source paper
extracted_from(2023) · Samuel Marks · Max Tegmark
Neighborhood — ranked by edge-count
Papers (1)
paper
Claims (1)
claim
- Establishes that the observed linear structure is not merely a representation of text probability
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- We hypothesize that LLMs represent correctness of arithmetic expressions differently from factual statements.hypothesis0.818Core working hypothesis motivating the factual vs. arithmetic task split in the experimental design.
- Binder et al. finding cited as evidence that LLMs possess introspective capacity analogous to mindfulness
- Interpretive claim connecting scale to abstraction level in LLM representations
- The core interpretive question the paper narrows but cannot definitively answer
- Clarification to avoid misinterpretation.
- Central empirical conclusion of the paper about the fundamental limits of truth directions.
- discussion of potential confounds
- Sharma et al. result supporting cross-modal alignment: language-only models implicitly encode visual structure