finding
active
finding:top-5-instructions-by-1-2-at-l-12-achieve-average-cosine-similarity-9893-and-average-accuracy-5645-on-gsm8k-adv-for-gemma3-4b-itTop-5 instructions by µ(1→2) at ℓ=12 achieve average cosine similarity .9893 and average accuracy .5645 on gsm8k_adv for Gemma3-4B-IT
High cosine similarity for Gemma3 steering vectors suggests strong linear reflection structure.
Source paper
extracted_from(2025) · Chang, Fu-Chieh · Lee, Yu-Ting · Wu, Pei-Yuan
Neighborhood — ranked by edge-count
Claims (1)
claim
- Core applied contribution claim, supported by top-k accuracy comparisons.
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Shows persona space axes are inherited from pre-training, not solely created by post-training
- Triggered Reflection with 'Alternatively' achieves accuracy .684 on gsm8k_adv for Gemma3-4B-ITfinding0.792Highest single-instruction accuracy result in the paper.
- Core result of Experiment 3: cross-model semantic convergence under self-referential processing
- Demonstrates that stronger models are largely insensitive to reflection manipulation
- Experiment 4 result showing DIM captures only one facet of the multi-dimensional truth subspace
- Appendix E replication of DIM alignment finding in Qwen model
- Baseline accuracy when reflection is suppressed.
- Steering vectors from µ(0→2) slightly outperform µ(1→2) for instruction discovery across datasets and modelsfinding0.773Shows that contrasting No Reflection with Triggered Reflection provides a stronger signal than Intrinsic vs Triggered.