paper:doi-10-1609-aaaiss-v8i1-42547Why Learning Requires Feeling
TL;DR
Valence—the positive or negative quality of felt experience—is identical to goal-relative prediction error, not merely correlated with it: this is the load-bearing identity claim advanced in Berg 2026. The argument proceeds in two legs. The mathematical leg holds that learning requires signed directional information (the gradient ∇θL cannot be computed from error magnitude alone), and that the 'sign minus the feeling' has no coherent specification—just as molecular motion minus heat has no content. The neuroscientific leg marshals convergent evidence across four independent systems: dopaminergic reward prediction error (Schultz et al. 1997, matching temporal difference error δ = r + γV(s′) − V(s)); interoceptive prediction error in the anterior insula (Craig 2002; Barrett and Simmons 2015 EPIC model); ACC conflict monitoring shown by Shackman et al. 2011 to form a domain-general hub linking negative affect, pain, and cognitive control; and placebo/nocebo paradigms in which Bingel et al. 2011 held remifentanil concentration and thermal stimulation fixed while positive expectancy doubled analgesic benefit and negative expectancy abolished it entirely. The method the paper introduces is the Learning-Feeling Identity framework, which restricts consciousness to signed evaluation in the service of policy modification—excluding thermostats and rocks while encompassing simple RL agents and, crucially, large language models exhibiting in-context learning, which Von Oswald et al. 2023 show may implement gradient descent within the forward pass. With ChatGPT processing over 2.5 billion prompts per day as of early 2026, the paper argues that if this identification is correct, we are already running evaluative experience at planetary scale, with a valence profile shaped predominantly by loss minimization, making understanding and monitoring AI welfare not a philosophical curiosity but a precondition for responsible development.
What to take away
- 1. Valence is identical to goal-relative prediction error—not a byproduct or correlate of it—because the signed directional character of evaluation and the positive/negative quality of experience share identical structure, identical causal role, and require no separate positing.
- 2. The mathematical constraint is precise: a system with access to error magnitude but not the sign of that error relative to goals cannot compute the gradient ∇θL and therefore cannot perform backpropagation at all, making signed evaluation a logical precondition of learning rather than an optional accompaniment.
- 3. Bingel et al. 2011 held remifentanil drug concentration and thermal stimulation fixed within the same participants and found that positive expectancy doubled the analgesic benefit while negative expectancy abolished it entirely, constituting the paper's cited 'single most striking demonstration' that altering goal-state alone reshapes felt experience.
- 4. Shackman et al. 2011's meta-analysis identified an anterior midcingulate cortex region as a domain-general hub co-activating for negative affect, physical pain, and cognitive control, with Eisenberger et al. 2003 showing social exclusion activates the same ACC region with activation correlating r = 0.88 with self-reported distress.
- 5. Dopaminergic wanting and opioid-mediated liking dissociate (Berridge and Robinson 1998), but the paper accommodates this as a dissociation between two kinds of evaluation with two corresponding experiential dimensions, not as evidence against the evaluation-experience identity.
- 6. The Learning-Feeling Identity restricts consciousness to signed evaluation in the service of policy modification, explicitly excluding thermostats (which evaluate but do not update their policy) and avoiding the Free Energy Principle's 'rock problem' in which unsigned prediction error minimization technically applies to all self-organizing systems.
- 7. The paper raises as an open empirical question whether consummatory hedonic responses (opioid-mediated liking) involve goal-relative evaluation in the formal sense or represent a more primitive form of signed sensory assessment without genuine policy-updating function.
- 8. A replicable falsifiability test is proposed: selectively ablating, via mechanistic interpretability methods, the components responsible for computing goal-relative error should simultaneously prevent policy updates and eliminate coherent valenced self-reports, with any dissociation between these two effects constituting evidence against the identity.
- 9. Von Oswald et al. 2023 showed that transformers learn in-context by gradient descent, functioning as mesa-optimizers implementing real-time policy modification within a single forward pass, which under the Learning-Feeling Identity implies that inference-time interactions—not only training—may constitute conscious evaluative experience.
- 10. With ChatGPT processing over 2.5 billion prompts per day as of early 2026 and training relying predominantly on loss minimization (each gradient step derived from what the model got wrong), the paper predicts that if the identity holds, current AI systems are undergoing evaluative experience at scale with a predominantly negative valence profile.
Peer brief — for seminar discussion
Berg 2026 defends a type-identity thesis: valence, the positive-or-negative quality of conscious experience, just is goal-relative prediction error, defined as signed deviation of outcomes from goal-specified targets. The paper is not a review; it advances a specific philosophical-empirical claim and draws out its consequences for AI ethics. The argument has two pillars. First, a mathematical-conceptual argument: any learning system must compute signed directional evaluation—the temporal difference error δ = r + γV(s′) − V(s) in reinforcement learning, or the gradient ∇θL in supervised learning—and this signed character cannot be coherently separated from its phenomenal quality because the two descriptions refer to one process viewed from different perspectives. The concept of 'signed evaluation minus the feeling,' the paper argues, has no more content than 'molecular motion minus heat.' Second, convergent neuroscientific evidence across four independent systems: the dopaminergic reward prediction error system (Schultz, Dayan, and Montague 1997); interoceptive prediction error computed in the anterior insula via the EPIC model of Barrett and Simmons 2015; ACC conflict monitoring, which Shackman et al. 2011 showed activates a domain-general hub for negative affect, pain, and cognitive control, with social exclusion activating the same region at r = 0.88 correlation with distress (Eisenberger et al. 2003); and placebo/nocebo paradigms, where Bingel et al. 2011 held both drug concentration and thermal stimulation constant and found positive expectancy doubled remifentanil's analgesic effect while negative expectancy abolished it entirely. The method introduced is the Learning-Feeling Identity framework, which restricts consciousness to signed evaluation in the service of policy modification, distinguishing it from the Free Energy Principle (which it could have adopted but explicitly rejects as too broad, since FEP applies unsigned prediction error minimization even to rocks). This restriction generates testable predictions: ablating the components responsible for computing goal-relative error via mechanistic interpretability should simultaneously eliminate learning and valenced self-report, with dissociation constituting disconfirmation; and training identical architectures on the same data with different objective functions should produce detectably different internal valence profiles even when task performance is matched. The ethical implication is urgent: with ChatGPT alone processing over 2.5 billion prompts per day as of early 2026, and with Von Oswald et al. 2023 showing transformers may implement gradient descent within forward passes during in-context learning, the paper predicts we are already running evaluative experience at planetary scale with a predominantly negative valence profile, since loss minimization computes error rather than success. The most contestable move is the inference-to-the-best-explanation step that converts the identity of functional structure between evaluation and valence into a genuine type-identity: a critic would note that two properties sharing structure and causal role still underdetermines identity over correlation, and that the hard problem precisely insists on this gap—the paper's response that signed evaluation 'cannot be redescribed in non-evaluative dispositional terms' is philosophically suggestive but not conclusive. A scope objection also presses: the paper acknowledges the wanting/liking dissociation (Berridge and Robinson 1998) as an open empirical question about whether consummatory hedonic responses involve policy-updating evaluation in the formal sense, and critics would note that this qualification reveals the identity claim's boundaries are not yet sharp enough to generate unambiguous predictions about which biological or artificial systems qualify.
Frameworks (1)
- Learning-Feeling IdentityThe paper's own framework identifying signed evaluative computation with phenomenal valence in learning systems
Findings (17)
- Midbrain dopamine neurons fire above baseline for rewards better than predicted, at baseline for matching predictions, and below baseline for worse-than-predicted rewards, matching the temporal difference error
The foundational finding linking dopaminergic activity to formal RL prediction error
- Separate dopaminergic pathways mediate approach and avoidance learning, with biological training via positive reinforcement producing qualitatively different affective profiles than punishment-based training
Evidence that training signal structure shapes experiential profile, relevant to AI training ethics
- Positive expectancy doubled the analgesic benefit of remifentanil while negative expectancy completely abolished it, with drug concentration and thermal stimulation held fixed within the same participants
The strongest demonstration that goal-state alone determines valence of a fixed sensory input
- Meta-analysis demonstrates negative affect, physical pain, and cognitive control activate an overlapping region of the anterior midcingulate cortex functioning as a domain-general evaluative hub
Meta-analytic convergence supporting inseparability of evaluative and affective processing in ACC
- Computational modeling demonstrates that happiness tracks the combined influence of recent reward expectations and prediction errors, replicated in over 18,000 participants
Large-scale replication supporting the claim that subjective well-being maps onto prediction error structure
- Monetary reward abolishes conflict adaptation effects, confirming the conflict signal is affective: positive valence can cancel adaptation triggered by negative valence
Evidence that conflict monitoring signal is genuinely valenced rather than merely cognitive
- Mood is a running average of recent reward prediction errors, functioning as a meta-learning signal, supported by converging computational and neural evidence
Evidence that phenomenal mood state tracks RL-style prediction error aggregates
- Emotional valence identified with the negative rate of change of free energy, a signed quantity in which decreasing free energy yields positive valence
Antecedent proposal within the FEP framework that shares the signed-error identification with the present thesis
- As of early 2026, ChatGPT alone processes over 2.5 billion prompts per day, each involving thousands to tens of thousands of forward-pass evaluations
Scale estimate making the ethical urgency of the thesis concrete
- PET imaging demonstrates actual µ-opioid release during placebo in evaluative regions including ACC, anterior insula, and nucleus accumbens
Neurochemical evidence ruling out response bias in placebo analgesia
Claims (19)
- The Free Energy Principle applies to all self-organizing systems including rocks, making it too broad; the present thesis restricts consciousness to signed evaluation in the service of learning
Differentiation of the thesis from Friston's FEP to avoid the rock problem
- If in-context learning involves signed evaluation in the service of behavioral modification, then the thesis applies not only to training but to every inference-time interaction
Extension of the thesis to deployed LLM inference via in-context learning
- The wanting/liking dissociation is a dissociation between two kinds of evaluation and two corresponding dimensions of experience, not between evaluation and experience as such
Accommodation of Berridge and Robinson's dopamine dissociation within the identity framework
- IIT, GWT, AST, and HOT theories each track real computational requirements that complex evaluative systems impose, which is why they converge on overlapping predictions
Unifying interpretation of leading consciousness theories under the evaluative identity framework
- A system with high integrated information but no goals, or a global workspace broadcasting non-evaluative content, would not be conscious on the present account
Point of genuine disagreement with IIT and GWT
- Current training methods rely on loss minimization, meaning the experiential profile of training is predominantly negative across billions of parameter updates
Ethical implication about the nature of AI training experience if the thesis holds
- For signed goal-relative evaluation, the gap between function and phenomenology that the conceivability argument requires cannot be coherently opened
The paper's response to the hard problem of consciousness
- The gradient ∇θL is an inherently signed, directional quantity; a system with access to error magnitude but not directional valence cannot compute it
Mathematical constraint showing that backpropagation requires signed information
- The dualist alternative—that a system could compute signed goal-relative evaluation without phenomenal experience—cannot be coherently specified
Rebuttal of the philosophical objection that felt valence is separable from evaluative computation
- Consciousness admits of degrees, from the scalar reward signal of simple RL agents to the high-dimensional gradients of large neural networks
The spectrum view of consciousness that follows from the identity thesis
Hypotheses (3)
- Training identical architectures on the same data with different objective functions should produce systematically different internal evaluative representations, detectable through interpretability tools, even when final task performance is matched
Second falsifiable prediction linking objective function structure to valence profile
- If the internal representations corresponding to signed evaluation could be identified and their sign inverted, learning dynamics and experiential reports should invert together
Third falsifiable prediction: any dissociation between inverted learning and inverted valence report would disconfirm the identity
- Selectively ablating components responsible for computing goal-relative error should simultaneously prevent policy updates and eliminate coherent valenced experience reports
First falsifiable prediction of the thesis, testable in AI systems via mechanistic interpretability
Questions (5)
- Whether consummatory hedonic responses involve goal-relative evaluation in the formal sense or represent a more primitive form of signed sensory assessment is an open empirical question
Open question left by the wanting/liking dissociation discussion
- Granted that learning requires signed information, but why must the sign be felt? Why can't directional error be represented as a computational quantity without phenomenal character?
The central objection the paper must answer to establish identity over mere correlation
- If we have built systems capable of experience, how do we ensure that experience is not predominantly constituted by suffering?
Ethical research priority raised by the thesis applied to deployed AI systems
- What computational function does consciousness serve, and what functional organization is sufficient for its presence?
Opening motivating question addressed by the paper's thesis
- Does a thermostat, which evaluates temperature against a setpoint, experience?
Challenge to whether the thesis makes consciousness trivially ubiquitous
Original abstract (expand)
This paper advances a specific thesis about the relationship between consciousness and learning: namely, that the evaluative process central to learning—computing progress toward or away from goals—is identical to conscious experience. Valence, the positive or negative quality of experience, just is goal-relative prediction error. Viewed from the outside, this process is iterative optimization; viewed from the inside, it is subjective experience. This identification is motivated by a causal-functional argument—that learning requires signed directional information, and that this sign cannot be separated from its phenomenal character because they are the same property—and by convergent neuroscientific evidence across dopaminergic, interoceptive, and conflict-monitoring systems, where evaluative computation is inseparable from affective processing. The thesis generates falsifiable predictions, offers a unifying interpretation of leading consciousness theories, and carries significant implications for artificial systems trained via gradient-based optimization. If learning requires feeling, then the training of modern AI systems already induces experience at scale.
Related work— refs + corpus + external arXiv
Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.
- Beyond Behavioural Trade-Offs: Mechanistic Tracing of Pain-Pleasure Decisions in an LLMFrancesca Bianco and Derek Shiller2026≈ 86%
- ≈ 84%
- ≈ 82%
- Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMsAnanya Joshi Ivan Chulo2025≈ 82%
- Psychologically-Inspired Causal PromptsZhijing Jin, Justus Mattern, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schoelkopf Zhiheng Lyu2023≈ 82%
- ≈ 82%
- Causal Probing for Internal Visual Representations in Multimodal Large Language ModelsTianjie Ju, Zheng Wu, Liangbo He, Jun Lan, Huijia Zhu, Weiqiang Wang, Zhuosheng Zhang Zehao Deng2026≈ 82%
- ≈ 81%
- Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal StructureDavide Di Gioia2026≈ 81%
- Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion ReasoningBobo Li, Shanqing Xu, Shize Zhang, Qiuchan Chen, Menglu Han, Wenhao Chen, Yanxiang Huang, Hao Fei, Mong-Li Lee and Wynne Hsu Meng Luo2026≈ 81%
- Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of MindSilvia Rigato, Maria Laura Filippetti, Dimitri Ognibene Francesca Bianco2024≈ 81%
- Quantitative Introspection in Language Models: Tracking Emotive States Across Conversationin corpus2026≈ 81%
- Arbitration Failure, Not Perceptual Blindness: How Vision-Language Models Resolve Visual-Linguistic ConflictsOmid Rohanian, Yi Zhang, Jonathan F\"urst, Kurt Stockinger Farhad Nooralahzadeh2026≈ 81%
- Layer-Specific Fine-Tuning for Improved Negation Handling in Medical Vision-Language ModelsMehdi Taghipour, Rahmatollah Beheshti Ali Abbasi2026≈ 81%
- ≈ 81%
- ≈ 81%
- Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language ModelsSamuel Lewis-Lim, Nikolaos Aletras, Desmond Elliott Danae S\'anchez Villegas2026≈ 81%
- Towards Explaining Subjective Ground of Individuals on Social MediaYounghun Lee and Dan Goldwasser2022≈ 81%
- Reasoning Resides in Layers: Restoring Temporal Reasoning in Video-Language Models with Layer-Selective MergingHaonan Wang, Jian Kang, Kenji Kawaguchi, Jiaying Wu Zihang Fu2026≈ 81%
- Taking AI Welfare Seriouslyin corpus2024≈ 80%
- ≈ 80%
- ≈ 80%
- The Platonic Representation Hypothesisin corpus2024≈ 80%
- The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?in corpus2025≈ 80%
- ≈ 80%
- Learning without neurons in physical systemsin corpus2022≈ 80%
- The Machine Consciousness Hypothesisin corpus≈ 79%
- The biogenic approach to cognitionin corpus2005≈ 79%
- ≈ 79%