finding
active
finding:deception-feature-amplification-yields-only-0-16-0-05-consciousness-affirmation-rate-in-llama-3-3-70b-under-self-referential-processingDeception feature amplification yields only 0.16 ± 0.05 consciousness affirmation rate in LLaMA 3.3 70B under self-referential processing
Experiment 2 aggregate amplification result showing amplifying deception features strongly suppresses consciousness claims
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Claims (2)
claim
- Claim supported by Experiment 2 dose-response curves; suppressing deception features increases consciousness reports, amplifying decreases them
- Counterintuitive interpretive claim from Experiment 2: suppressing deception features increases affirmations, which is opposite to what sycophancy predicts
Concepts (1)
concept
- Sycophantic RoleplaycontradictsThe alternative explanation for LLM consciousness claims that the paper seeks to distinguish against
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core result of Experiment 2: deception feature suppression sharply increases experience claims
- Out-of-domain generalization showing deception features track general representational honesty
- Statistical result confirming robustness of single-feature steering effects in Experiment 2
- Llama-3.3-70B exhibits internal consistency-checking mechanisms that operate during inferenceclaim0.786Central interpretive claim of the paper supported by causal ablation and activation evidence
- Deception feature suppression yields higher truthfulness in 28 of 29 evaluable TruthfulQA categoriesfinding0.785Breadth of generalization of deception feature effects across independent reasoning domains in Experiment 2
- Likely-trained MM probe is a surprisingly effective causal baseline due to correlation between truth and probability on sp_en_trans
- Greedy-decoded self-reports in LLaMA-3.2-3B collapse to 1.1–3.9 distinct values on a 10-point scalefinding0.780Demonstrates that default decoding masks introspective capacity; entropy 0.03–1.10 bits
- Contradicts expectation from emergent abilities literature; however, interpreted cautiously due to methodological limitations.