question

active

question:what-semantic-labels-correspond-to-the-individual-basis-vectors-of-the-truth-cone

What semantic labels correspond to the individual basis vectors of the truth cone?

Central open question for future work on interpretability of cone axes

Source paper

extracted_from

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs

(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4

Neighborhood — ranked by edge-count

Papers (1)

paper

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
associated_with

Hypotheses (1)

hypothesis

Individual cone basis vectors may correspond to interpretable semantic facets of truth such as temporal facts, geographic facts, or commonsense
gates
Future direction hypothesis for giving semantic meaning to individual axes

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Semantic Labeling of Cone Axesconcept0.771
The open problem of assigning interpretable semantic meaning to individual cone basis vectors (e.g., temporal vs geographic facts)
Given a true propositional input (e.g., 'Paris is the capital of France'), ablating along any basis vector of this cone disrupts the model's ability to generate a truthful response.quote0.743
Load-bearing illustration of what a concept cone for truth means operationally
The underlying truth representation may generalize across lexical choices and languageshypothesis0.737
Suggested by non-English Yes/No outputs post-intervention, requiring further investigation
What appears to be a representation of lexical entailment in BERT is actually a data structure of two word identity representations, not an encoding of the entailment relationclaim0.730
Key asymmetry between hierarchical equality and NLI experiments; BERT stores identities rather than the abstract relation.
Steering vectors discover effective triggers such as 'However' and 'Otherwise', consistent with prior reported reflection datasetsfinding0.729
Validates that steering vectors capture reflection semantics by finding tokens reported in related work.
Steering vectors enable systematic discovery of reflection-inducing instructions beyond trial-and-error prompt design.claim0.728
Core applied contribution claim, supported by top-k accuracy comparisons.
Concept cone truth interventions would generalize to larger frontier models and multimodal settingshypothesis0.723
Key robustness question raised as future work
Truth-evaluation framing specifically contributes to truth geometry shifts beyond generic instruction-following prefix.claim0.723
Supported by the neutral read-prompt changing emergence but not fully replicating ask-correct cross-task generalization.