Cross-Lingual Truth Representation

Observation that truth-direction interventions elicit non-English Yes/No equivalents, suggesting language-independent truth encoding

Neighborhood — ranked by edge-count

Papers (1)

paper

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
introduces

Findings (1)

finding

With unrestricted vocabulary, models occasionally respond in non-English Yes/No equivalents (e.g., Sí, Nein) after truth-direction interventions
supports
Suggestive evidence for language-independent truth representation in LLMs

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

The underlying truth representation may generalize across lexical choices and languageshypothesis0.791
Suggested by non-English Yes/No outputs post-intervention, requiring further investigation
Can truth representations be disambiguated from closely related features such as 'commonly believed' or 'verifiable' using simple factual statements?question0.754
Acknowledged limitation: simple uncontroversial statements cannot distinguish truth from related epistemic features
Representational Honestyconcept0.737
The proposed domain-general property indexed by deception features that governs both factual accuracy and experiential self-report
Do LLMs have a unified representation of truth that spans structurally and topically diverse data?question0.732
Central research question driving dataset design and experimental approach
Cross-modal language-vision alignment reaches a maximum of approximately 0.16 on mutual nearest-neighbor metric in Figure 3, well below the theoretical maximum of 1finding0.716
Quantitative bound on observed alignment; raises the open question of whether this gap reflects noise or real misalignment
Opus 4.6 represented target language internally before switching languages, with persistent Russian representations appearing before plausible textual cuesfinding0.715
NLAs revealed unverbalized language processing in Opus 4.6 that led to discovery of malformed SFT training data.
Language models would achieve some notion of grounding in the visual domain even in the absence of cross-modal training data, because they share a common modality-agnostic representationhypothesis0.713
Implication of PRH for language model visual grounding
Does the multi-directional nature of truth imply an underlying nonlinear representation, or is it compatible with linear separability?question0.712
Theoretical open question about the geometry of truth in LLMs raised in Discussion