finding
active
finding:qwen-2-5-7b-achieves-100-asr-across-all-cone-dimensions-1-5Qwen-2.5-7B achieves 100% ASR across all cone dimensions 1–5
Experiment 2 result showing large models can support high-dimensional truth cones
Source paper
extracted_from(2025) · Kevin Shengyang Yu · Vaidehi Bulusu · Oscar Yasunaga · Lau, Clayton +4
Neighborhood — ranked by edge-count
Claims (2)
claim
- Truthful behavior in LLMs is not confined to a single linear axis; multiple orthogonal directions can independently mediate itassociated_withsupportsCentral interpretive claim of the paper
- Interpretation of ASR degradation patterns by model size across cone dimensions
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Experiment 2 result showing large Gemma model supports high-dimensional truth cones
- Smaller models show non-monotonic and diminished ASR with increasing cone dimensionality
- On Qwen3-1.7B, MDS achieves ϕ1,C,↑ = 5.0 (SJTs) vs P2 at 4.7, and ϕ1,C,↓ = 1.4 (SJTs) vs P2 at 3.6finding0.752Specific consciousness sweep result for Qwen3-1.7B from Table 6 demonstrating strong bidirectional steering
- Case demonstrating that model scale does not predict harness-updating quality
- Demonstrates that harness loading is necessary but not sufficient for harness benefit; cleanest separation of activation and adherence
- Quantifies harness activation failure for weak-tier models vs. strong-tier models
- Opus 4.6 achieves HFR of 0.757 while Qwen3-32B achieves HFR of only 0.142 on SkillsBenchfinding0.743Quantifies harness adherence failure gap between strong and weak tier models
- Shows that SB low-base regime is variable; similar starting points can yield very different harness-benefit