finding
active
finding:sae-feature-steering-effect-on-consciousness-reports-z-8-06-p-7-7-10-16-in-llama-3-3-70bSAE feature steering effect on consciousness reports: z=8.06, p=7.7×10⁻¹⁶ in LLaMA 3.3 70B
Statistical significance of the gating effect in Experiment 2
Source paper
extracted_from(2025) · Berg, Cameron · de Lucena, Diogo · Rosenblatt, Judd
Neighborhood — ranked by edge-count
Claims (1)
claim
- Claim supported by Experiment 2 dose-response curves; suppressing deception features increases consciousness reports, amplifying decreases them
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- Core result of Experiment 2: deception feature suppression sharply increases experience claims
- Evidence of a bottleneck between richer internal variation and final report distribution in impulsivity→interest condition
- SAE Feature #28256 induces reports of happiness and fun, positive valence self-steering examplefinding0.779Example of a positively valenced SAE feature with consistent self-report of happiness across multiple steering sessions
- Text-based and self-steered emotionality ratings for SAE features are correlated at only ρ = +0.051 (n.s.).finding0.779Shows low agreement between the two evaluation modalities
- Shows gating effect is specific to the self-referential computational regime, not a general feature effect
- Evidence that improved introspection in focus→wellbeing arises from enriched internal state and report channels simultaneously
- Strongest cross-concept introspection improvement; survives BH correction (q≈0.011)
- Quantifies per-concept effect size of same-concept steering on self-report