question
active
question:are-the-discovered-truth-directions-robust-to-architectural-variation-and-fine-tuning-differences-across-model-familiesAre the discovered truth directions robust to architectural variation and fine-tuning differences across model families?
Open question on generalization beyond Gemma and Qwen families
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Papers (1)
paper
Hypotheses (1)
hypothesis
- Concept cone truth interventions would generalize to larger frontier models and multimodal settingsgatesKey robustness question raised as future work
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Interpretation of KL divergence retention results
- Does instructing the model to assess correctness affect the geometry of truth directions?question0.804One of the three guiding research questions of the paper.
- Overarching conclusion summarizing the paper's contribution relative to prior universality claims.
- Motivating hypothesis for Section 5's investigation of prompt template effects.
- Research question motivating Section 5.
- Central empirical conclusion of the paper about the fundamental limits of truth directions.
- Safety implication derived from multi-dimensional truth structure finding
- Identified as the exact computational operation that breaks truth direction generalization.