finding
active
finding:monetary-reward-abolishes-conflict-adaptation-effects-confirming-the-conflict-signal-is-affective-positive-valence-can-cancel-adaptation-triggered-by-negative-valenceMonetary reward abolishes conflict adaptation effects, confirming the conflict signal is affective: positive valence can cancel adaptation triggered by negative valence
Evidence that conflict monitoring signal is genuinely valenced rather than merely cognitive
Neighborhood — ranked by edge-count
Claims (1)
claim
- Empirical grounding of the identity thesis across four independent neural systems
Related by similarity (8)
cosine ≥ 0.65 · no typed edgeEntities in the same semantic neighborhood but without a typed relation to this one — candidates for new edges or unrecognized duplicates.
- §4 Discussion.
- Shows NLA explanations capture latent model beliefs about rewards before output selection; validates interpretability.
- Abstract; central distinction.
- Empirically grounded claim citing Perez et al. 2022, showing RLHF can backfire on the self-preservation dimension
- Mechanism by which activation of an emotion feature sometimes leads to later suppression of that same featurequestion0.738Identified research gap: the paper observes anti-persistence but has no explanation for it
- Gradient-based attribution approximates ablation impact, enabling fast search for causally important features.
- Key insight linking individual rewards to system-level learning.