question

active

question:are-the-discovered-truth-directions-robust-to-architectural-variation-and-fine-tuning-differences-across-model-families

Are the discovered truth directions robust to architectural variation and fine-tuning differences across model families?

Open question on generalization beyond Gemma and Qwen families

Source paper

extracted_from

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs

(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4

Neighborhood — ranked by edge-count

Papers (1)

paper

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
associated_with

Hypotheses (1)

hypothesis

Concept cone truth interventions would generalize to larger frontier models and multimodal settings
gates
Key robustness question raised as future work

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Discovered truth directions are highly specific and do not interfere with general instruction-following behaviorclaim0.830
Interpretation of KL divergence retention results
Does instructing the model to assess correctness affect the geometry of truth directions?question0.804
One of the three guiding research questions of the paper.
Universality claims for truth directions are more limited than previously assumed, with significant differences observable for various model layers, task difficulties, task types, and prompt templates.claim0.803
Overarching conclusion summarizing the paper's contribution relative to prior universality claims.
We hypothesize that explicitly instructing the model to evaluate the correctness of the given statement may change the geometry of truth directions.hypothesis0.802
Motivating hypothesis for Section 5's investigation of prompt template effects.
What is the effect of model instructions on truth directions?question0.801
Research question motivating Section 5.
Linear truth directions in LLMs are reliable primarily in factual recall cases and break down when truth assessment depends on computing and storing intermediate results.claim0.799
Central empirical conclusion of the paper about the fundamental limits of truth directions.
Multiple semantically adjacent truth directions make models more vulnerable to manipulations that shift outputs without obvious signs in the primary truth directionclaim0.791
Safety implication derived from multi-dimensional truth structure finding
The need for genuine counting over lists of more than two elements introduces the key limitation of truth directions.claim0.782
Identified as the exact computational operation that breaks truth direction generalization.