B8 final accuracy 92.4 ± 1.8%

Accuracy at k=16 shots for B8.

Source paper

extracted_from

(2025) · Edward Yi Chang · Kaya, Zeyneb N. · Ethan Chang

community

Chain-of-Thought reasoning robustness & safety
members_of
CoT effects on generalization, multimodal QA accuracy, and AI safety alignment training.
Multimodal chain-of-thought reasoning benchmarks
members_of
ScienceQA and related vision-language tasks evaluated via explicit reasoning steps, spanning 738M-parameter models with 89-95% accuracy ranges.
Benchmark classification accuracy results
members_of
Three benchmarks (B8, B9, B10) with mean accuracy and standard deviation metrics.

question

cosine ≥ 0.65 · no typed edge

Entities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.

B10 final accuracy 94.8 ± 1.2%finding0.887
Accuracy at k=16 shots for B10.
B9 final accuracy 89.7 ± 2.1%finding0.875
Accuracy at k=16 shots for B9.
Binary detection adjusted accuracy reaches 97.3% at layer 0 with α=5 before baseline control is appliedfinding0.770
The misleadingly high result that prior paradigm would report as evidence of introspection
No Reflection with 'Answer' achieves accuracy .037 on gsm8k_adv for Qwen2.5-3Bfinding0.742
Baseline accuracy when reflection is suppressed.
CalmeRys-78B Perspectives accuracy slightly reduced to 95.2% ± 2.21% after SOO fine-tuningfinding0.739
SOO fine-tuning caused slight reduction in perspective-taking accuracy for the largest model
QwQ-32B accuracy on GSM8k remains between 96.36% and 96.50% across all intervention strengths (-0.96 to +0.48)finding0.735
Demonstrates that stronger models are largely insensitive to reflection manipulation
Binary detection accuracy (up to 97.3% at L0 α=5) is entirely explained by global logit shifts (r=0.999 correlation with control)finding0.728
Core negative result: the binary detection paradigm cannot distinguish genuine introspection from uniform output bias
Gemma-2-27B Perspectives accuracy remains 100% after SOO fine-tuningfinding0.727
SOO fine-tuning did not collapse Gemma-2-27B self-other distinction needed for perspective-taking