Qwen-2.5-7B achieves 100% ASR across all cone dimensions 1–5

Experiment 2 result showing large models can support high-dimensional truth cones

Source paper

extracted_from

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs

(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4

Neighborhood — ranked by edge-count

Claims (2)

claim

Truthful behavior in LLMs is not confined to a single linear axis; multiple orthogonal directions can independently mediate it
associated_withsupports
Central interpretive claim of the paper
Larger models can support higher-dimensional truth cones than smaller models
supports
Interpretation of ASR degradation patterns by model size across cone dimensions

Related by similarity (8)

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

Gemma-2-9B achieves near-100% ASR (97.3–100%) across all cone dimensions 1–5finding0.859
Experiment 2 result showing large Gemma model supports high-dimensional truth cones
Qwen-2.5-3B ASR drops from 98.6% at dim 1 to 45.1% at dim 2, recovering partially then declining to 65.3% at dim 5finding0.815
Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
On Qwen3-1.7B, MDS achieves ϕ1,C,↑ = 5.0 (SJTs) vs P2 at 4.7, and ϕ1,C,↓ = 1.4 (SJTs) vs P2 at 3.6finding0.752
Specific consciousness sweep result for Qwen3-1.7B from Table 6 demonstrating strong bidirectional steering
Qwen3.5-9B evolver achieves highest harness-updating gain on SkillsBench (3.8 pp), exceeding Claude Opus 4.6 (2.3 pp) and Qwen3-235B (1.5 pp)finding0.750
Case demonstrating that model scale does not predict harness-updating quality
Qwen3-235B has SLR of 0.961 (nearly identical to Opus 4.6) yet HFR of only 0.350, with LPR of 0.022 vs. Opus 4.6's 0.177finding0.748
Demonstrates that harness loading is necessary but not sufficient for harness benefit; cleanest separation of activation and adherence
Qwen3-32B achieves a skill-load rate of 0.251, while Opus 4.6, Sonnet 4.6, and Qwen3-235B achieve SLR of 0.957–0.961finding0.747
Quantifies harness activation failure for weak-tier models vs. strong-tier models
Opus 4.6 achieves HFR of 0.757 while Qwen3-32B achieves HFR of only 0.142 on SkillsBenchfinding0.743
Quantifies harness adherence failure gap between strong and weak tier models
Qwen3-235B achieves only 1.1 pp harness-benefit on SkillsBench despite 4.7% base pass rate, near Qwen3-32B's 0.0% baselinefinding0.741
Shows that SB low-base regime is variable; similar starting points can yield very different harness-benefit