Vasu Sharma

orcid 0009-0006-4348-7412 openalex A5102320649 name_hash f6870eb12c1a364b314ca497…

Authored

Introduces

Studies

Affiliations

Cited by

Authored papers (1)

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs2025
Propositional truth in LLMs is not encoded as a single linear direction but as a multi-dimensional subspace that can be characterized by concept cones—sets of all nonnegative linear combinations of orthonormal basis vectors, each of which independently causally mediates true/false behavior. Applying the gradient-based concept cone framework (introduced by Wollschläger et al. 2025 for refusal) to truth, experiments across Qwen2.5-3B, Qwen2.5-7B, Qwen2.5-14B, Gemma-2-2B, and Gemma-2-9B show that Qwen2.5-7B and Gemma-2-9B sustain near-100% Answer Switching Rate (ASR) across all tested cone dimensionalities from 1 to 5, confirming at least a 5-dimensional truth-mediating subspace in those models. Directional ablation using discovered cone vectors on 200 Alpaca prompts yields mean KL divergences of only 0.026–0.045 across models, confirming surgical specificity. Cosine similarities between the classic difference-in-means (DIM) truth vector and all cone basis vectors beyond the first are on the order of 10⁻⁹, establishing that the additional axes are genuinely orthogonal to DIM rather than refinements of it. Truth-related directions reliably emerge between 60–75% of normalized layer depth, peaking at the final token position. These findings imply that models may be more vulnerable to adversarial manipulation of truthfulness than single-direction accounts suggest, because multiple independently steerable dimensions of factual behavior exist and can be exploited without disturbing the primary direction detectable by standard probing.

More papers — OpenAlex / S2

Co-authors (12)

Cole Blondin9 shared
Kevin Zhu9 shared
Oscar Yasunaga9 shared
Sean O’Brien9 shared
Vaidehi Bulusu9 shared
Kevin Shengyang Yu6 shared
Lau, Clayton6 shared
Amos Azaria3 shared
Arditi et al.3 shared
Clayton Lau3 shared
Max Tegmark3 shared
Samuel Marks3 shared

Recent mentions (1)

papers-typed
yu-2025-directions-cones.md