finding
active
finding:dose-response-curves-for-six-individual-deception-features-show-z-8-06-p-7-7-10-16-for-suppression-vs-amplification-contrast-on-consciousness-queryDose-response curves for six individual deception features show z=8.06, p=7.7×10⁻¹⁶ for suppression vs. amplification contrast on consciousness query
Statistical result confirming robustness of single-feature steering effects in Experiment 2
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Findings (1)
finding
- Core result of Experiment 2: deception feature suppression sharply increases experience claims
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Experiment 2 aggregate amplification result showing amplifying deception features strongly suppresses consciousness claims
- Out-of-domain generalization showing deception features track general representational honesty
- Gemma-2-27B-it deceptive response rate reduced from 100% to 9.36% ± 7.09% after SOO fine-tuningfinding0.767Primary result showing SOO fine-tuning significantly reduces deception in Gemma-2-27B
- Deception feature suppression yields higher truthfulness in 28 of 29 evaluable TruthfulQA categoriesfinding0.760Breadth of generalization of deception feature effects across independent reasoning domains in Experiment 2
- Strongest probe validation result; highest Cohen's d among the four concepts
- Impulsivity→interest: ρ increases from 0.70 (α=-4) to 0.83 (α=+4); R² from 0.46 to 0.69 in LLaMA-3.2-3Bfinding0.748Scatter plot visualization showing strengthened probe-report relationship across alpha range
- Scaling finding suggesting larger models benefit more from SOO fine-tuning