claim

active

claim:larger-models-can-support-higher-dimensional-truth-cones-than-smaller-models

Larger models can support higher-dimensional truth cones than smaller models

Interpretation of ASR degradation patterns by model size across cone dimensions

Source paper

extracted_from

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs

(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4

Neighborhood — ranked by edge-count

Findings (4)

finding

Qwen-2.5-7B achieves 100% ASR across all cone dimensions 1–5
supports
Experiment 2 result showing large models can support high-dimensional truth cones
Gemma-2-9B achieves near-100% ASR (97.3–100%) across all cone dimensions 1–5
supports
Experiment 2 result showing large Gemma model supports high-dimensional truth cones
Gemma-2-2B ASR drops from 100% at dims 1–2 to 43.1% at dim 4 and 27.1% at dim 5
supports
Small Gemma model shows severe ASR degradation at higher cone dimensions
Qwen-2.5-3B ASR drops from 98.6% at dim 1 to 45.1% at dim 2, recovering partially then declining to 65.3% at dim 5
supports
Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality

Claims (1)

claim

Representational abstraction of truth may emerge more clearly with model scale
extends
Interpretation of weaker PCA separation and lower ASR in smaller models

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Truth may be linearly separable in the model's representation space, but the structure is richer than a single linear axisclaim0.798
Interpretive synthesis of DIM and cone intervention successes
Bigger models are more likely to converge to a shared representation than smaller modelshypothesis0.793
Selective pressure toward convergence via model capacity
Concept cone truth interventions would generalize to larger frontier models and multimodal settingshypothesis0.772
Key robustness question raised as future work
The model appears to encode truth differently under passive versus active truth evaluation prompts.claim0.765
Key finding from Section 5 based on low cosine similarity between no-prompt and ask-correct probes.
Larger models should amplify bias less than smaller models, with model biases more accurately reflecting data biases rather than exacerbating themclaim0.762
Implication of PRH for AI fairness and bias
Features may not be strictly one-dimensional objects; higher-dimensional feature manifolds may exist in model representationshypothesis0.761
Extension of superposition hypothesis to account for continuous families of features
Universality claims for truth directions are more limited than previously assumed, with significant differences observable for various model layers, task difficulties, task types, and prompt templates.claim0.761
Overarching conclusion summarizing the paper's contribution relative to prior universality claims.
The difficulty boundary for truth directions replicates across all four tested models (Llama-3.2-3B, Llama-3.1-8B, Gemma-2-2b, Gemma-2-9b); generalization to F3-F5 remains consistently low regardless of model size or family.finding0.752
Establishes generalizability of the core difficulty-boundary finding across model families.