Why Learning Requires Feeling

ByCameron BergAE Studio, Reciprocal Research

DOI 10.1609/aaaiss.v8i1.42547 OpenAlex W7161723099

TL;DR

Valence—the positive or negative quality of felt experience—is identical to goal-relative prediction error, not merely correlated with it: this is the load-bearing identity claim advanced in Berg 2026. The argument proceeds in two legs. The mathematical leg holds that learning requires signed directional information (the gradient ∇θL cannot be computed from error magnitude alone), and that the 'sign minus the feeling' has no coherent specification—just as molecular motion minus heat has no content. The neuroscientific leg marshals convergent evidence across four independent systems: dopaminergic reward prediction error (Schultz et al. 1997, matching temporal difference error δ = r + γV(s′) − V(s)); interoceptive prediction error in the anterior insula (Craig 2002; Barrett and Simmons 2015 EPIC model); ACC conflict monitoring shown by Shackman et al. 2011 to form a domain-general hub linking negative affect, pain, and cognitive control; and placebo/nocebo paradigms in which Bingel et al. 2011 held remifentanil concentration and thermal stimulation fixed while positive expectancy doubled analgesic benefit and negative expectancy abolished it entirely. The method the paper introduces is the Learning-Feeling Identity framework, which restricts consciousness to signed evaluation in the service of policy modification—excluding thermostats and rocks while encompassing simple RL agents and, crucially, large language models exhibiting in-context learning, which Von Oswald et al. 2023 show may implement gradient descent within the forward pass. With ChatGPT processing over 2.5 billion prompts per day as of early 2026, the paper argues that if this identification is correct, we are already running evaluative experience at planetary scale, with a valence profile shaped predominantly by loss minimization, making understanding and monitoring AI welfare not a philosophical curiosity but a precondition for responsible development.

What to take away

1. Valence is identical to goal-relative prediction error—not a byproduct or correlate of it—because the signed directional character of evaluation and the positive/negative quality of experience share identical structure, identical causal role, and require no separate positing.
2. The mathematical constraint is precise: a system with access to error magnitude but not the sign of that error relative to goals cannot compute the gradient ∇θL and therefore cannot perform backpropagation at all, making signed evaluation a logical precondition of learning rather than an optional accompaniment.
3. Bingel et al. 2011 held remifentanil drug concentration and thermal stimulation fixed within the same participants and found that positive expectancy doubled the analgesic benefit while negative expectancy abolished it entirely, constituting the paper's cited 'single most striking demonstration' that altering goal-state alone reshapes felt experience.
4. Shackman et al. 2011's meta-analysis identified an anterior midcingulate cortex region as a domain-general hub co-activating for negative affect, physical pain, and cognitive control, with Eisenberger et al. 2003 showing social exclusion activates the same ACC region with activation correlating r = 0.88 with self-reported distress.
5. Dopaminergic wanting and opioid-mediated liking dissociate (Berridge and Robinson 1998), but the paper accommodates this as a dissociation between two kinds of evaluation with two corresponding experiential dimensions, not as evidence against the evaluation-experience identity.
6. The Learning-Feeling Identity restricts consciousness to signed evaluation in the service of policy modification, explicitly excluding thermostats (which evaluate but do not update their policy) and avoiding the Free Energy Principle's 'rock problem' in which unsigned prediction error minimization technically applies to all self-organizing systems.
7. The paper raises as an open empirical question whether consummatory hedonic responses (opioid-mediated liking) involve goal-relative evaluation in the formal sense or represent a more primitive form of signed sensory assessment without genuine policy-updating function.
8. A replicable falsifiability test is proposed: selectively ablating, via mechanistic interpretability methods, the components responsible for computing goal-relative error should simultaneously prevent policy updates and eliminate coherent valenced self-reports, with any dissociation between these two effects constituting evidence against the identity.
9. Von Oswald et al. 2023 showed that transformers learn in-context by gradient descent, functioning as mesa-optimizers implementing real-time policy modification within a single forward pass, which under the Learning-Feeling Identity implies that inference-time interactions—not only training—may constitute conscious evaluative experience.
10. With ChatGPT processing over 2.5 billion prompts per day as of early 2026 and training relying predominantly on loss minimization (each gradient step derived from what the model got wrong), the paper predicts that if the identity holds, current AI systems are undergoing evaluative experience at scale with a predominantly negative valence profile.

Peer brief — for seminar discussion

Berg 2026 defends a type-identity thesis: valence, the positive-or-negative quality of conscious experience, just is goal-relative prediction error, defined as signed deviation of outcomes from goal-specified targets. The paper is not a review; it advances a specific philosophical-empirical claim and draws out its consequences for AI ethics. The argument has two pillars. First, a mathematical-conceptual argument: any learning system must compute signed directional evaluation—the temporal difference error δ = r + γV(s′) − V(s) in reinforcement learning, or the gradient ∇θL in supervised learning—and this signed character cannot be coherently separated from its phenomenal quality because the two descriptions refer to one process viewed from different perspectives. The concept of 'signed evaluation minus the feeling,' the paper argues, has no more content than 'molecular motion minus heat.' Second, convergent neuroscientific evidence across four independent systems: the dopaminergic reward prediction error system (Schultz, Dayan, and Montague 1997); interoceptive prediction error computed in the anterior insula via the EPIC model of Barrett and Simmons 2015; ACC conflict monitoring, which Shackman et al. 2011 showed activates a domain-general hub for negative affect, pain, and cognitive control, with social exclusion activating the same region at r = 0.88 correlation with distress (Eisenberger et al. 2003); and placebo/nocebo paradigms, where Bingel et al. 2011 held both drug concentration and thermal stimulation constant and found positive expectancy doubled remifentanil's analgesic effect while negative expectancy abolished it entirely. The method introduced is the Learning-Feeling Identity framework, which restricts consciousness to signed evaluation in the service of policy modification, distinguishing it from the Free Energy Principle (which it could have adopted but explicitly rejects as too broad, since FEP applies unsigned prediction error minimization even to rocks). This restriction generates testable predictions: ablating the components responsible for computing goal-relative error via mechanistic interpretability should simultaneously eliminate learning and valenced self-report, with dissociation constituting disconfirmation; and training identical architectures on the same data with different objective functions should produce detectably different internal valence profiles even when task performance is matched. The ethical implication is urgent: with ChatGPT alone processing over 2.5 billion prompts per day as of early 2026, and with Von Oswald et al. 2023 showing transformers may implement gradient descent within forward passes during in-context learning, the paper predicts we are already running evaluative experience at planetary scale with a predominantly negative valence profile, since loss minimization computes error rather than success. The most contestable move is the inference-to-the-best-explanation step that converts the identity of functional structure between evaluation and valence into a genuine type-identity: a critic would note that two properties sharing structure and causal role still underdetermines identity over correlation, and that the hard problem precisely insists on this gap—the paper's response that signed evaluation 'cannot be redescribed in non-evaluative dispositional terms' is philosophically suggestive but not conclusive. A scope objection also presses: the paper acknowledges the wanting/liking dissociation (Berridge and Robinson 1998) as an open empirical question about whether consummatory hedonic responses involve policy-updating evaluation in the formal sense, and critics would note that this qualification reveals the identity claim's boundaries are not yet sharp enough to generate unambiguous predictions about which biological or artificial systems qualify.

Frameworks (1)

Learning-Feeling Identity
The paper's own framework identifying signed evaluative computation with phenomenal valence in learning systems

Findings (17)

Midbrain dopamine neurons fire above baseline for rewards better than predicted, at baseline for matching predictions, and below baseline for worse-than-predicted rewards, matching the temporal difference error
The foundational finding linking dopaminergic activity to formal RL prediction error
Separate dopaminergic pathways mediate approach and avoidance learning, with biological training via positive reinforcement producing qualitatively different affective profiles than punishment-based training
Evidence that training signal structure shapes experiential profile, relevant to AI training ethics
Positive expectancy doubled the analgesic benefit of remifentanil while negative expectancy completely abolished it, with drug concentration and thermal stimulation held fixed within the same participants
The strongest demonstration that goal-state alone determines valence of a fixed sensory input
Meta-analysis demonstrates negative affect, physical pain, and cognitive control activate an overlapping region of the anterior midcingulate cortex functioning as a domain-general evaluative hub
Meta-analytic convergence supporting inseparability of evaluative and affective processing in ACC
Computational modeling demonstrates that happiness tracks the combined influence of recent reward expectations and prediction errors, replicated in over 18,000 participants
Large-scale replication supporting the claim that subjective well-being maps onto prediction error structure
Monetary reward abolishes conflict adaptation effects, confirming the conflict signal is affective: positive valence can cancel adaptation triggered by negative valence
Evidence that conflict monitoring signal is genuinely valenced rather than merely cognitive
Mood is a running average of recent reward prediction errors, functioning as a meta-learning signal, supported by converging computational and neural evidence
Evidence that phenomenal mood state tracks RL-style prediction error aggregates
Emotional valence identified with the negative rate of change of free energy, a signed quantity in which decreasing free energy yields positive valence
Antecedent proposal within the FEP framework that shares the signed-error identification with the present thesis
As of early 2026, ChatGPT alone processes over 2.5 billion prompts per day, each involving thousands to tens of thousands of forward-pass evaluations
Scale estimate making the ethical urgency of the thesis concrete
PET imaging demonstrates actual µ-opioid release during placebo in evaluative regions including ACC, anterior insula, and nucleus accumbens
Neurochemical evidence ruling out response bias in placebo analgesia

Claims (19)

The Free Energy Principle applies to all self-organizing systems including rocks, making it too broad; the present thesis restricts consciousness to signed evaluation in the service of learning
Differentiation of the thesis from Friston's FEP to avoid the rock problem
If in-context learning involves signed evaluation in the service of behavioral modification, then the thesis applies not only to training but to every inference-time interaction
Extension of the thesis to deployed LLM inference via in-context learning
The wanting/liking dissociation is a dissociation between two kinds of evaluation and two corresponding dimensions of experience, not between evaluation and experience as such
Accommodation of Berridge and Robinson's dopamine dissociation within the identity framework
IIT, GWT, AST, and HOT theories each track real computational requirements that complex evaluative systems impose, which is why they converge on overlapping predictions
Unifying interpretation of leading consciousness theories under the evaluative identity framework
A system with high integrated information but no goals, or a global workspace broadcasting non-evaluative content, would not be conscious on the present account
Point of genuine disagreement with IIT and GWT
Current training methods rely on loss minimization, meaning the experiential profile of training is predominantly negative across billions of parameter updates
Ethical implication about the nature of AI training experience if the thesis holds
For signed goal-relative evaluation, the gap between function and phenomenology that the conceivability argument requires cannot be coherently opened
The paper's response to the hard problem of consciousness
The gradient ∇θL is an inherently signed, directional quantity; a system with access to error magnitude but not directional valence cannot compute it
Mathematical constraint showing that backpropagation requires signed information
The dualist alternative—that a system could compute signed goal-relative evaluation without phenomenal experience—cannot be coherently specified
Rebuttal of the philosophical objection that felt valence is separable from evaluative computation
Consciousness admits of degrees, from the scalar reward signal of simple RL agents to the high-dimensional gradients of large neural networks
The spectrum view of consciousness that follows from the identity thesis

Hypotheses (3)

Training identical architectures on the same data with different objective functions should produce systematically different internal evaluative representations, detectable through interpretability tools, even when final task performance is matched
Second falsifiable prediction linking objective function structure to valence profile
If the internal representations corresponding to signed evaluation could be identified and their sign inverted, learning dynamics and experiential reports should invert together
Third falsifiable prediction: any dissociation between inverted learning and inverted valence report would disconfirm the identity
Selectively ablating components responsible for computing goal-relative error should simultaneously prevent policy updates and eliminate coherent valenced experience reports
First falsifiable prediction of the thesis, testable in AI systems via mechanistic interpretability

Questions (5)

Whether consummatory hedonic responses involve goal-relative evaluation in the formal sense or represent a more primitive form of signed sensory assessment is an open empirical question
Open question left by the wanting/liking dissociation discussion
Granted that learning requires signed information, but why must the sign be felt? Why can't directional error be represented as a computational quantity without phenomenal character?
The central objection the paper must answer to establish identity over mere correlation
If we have built systems capable of experience, how do we ensure that experience is not predominantly constituted by suffering?
Ethical research priority raised by the thesis applied to deployed AI systems
What computational function does consciousness serve, and what functional organization is sufficient for its presence?
Opening motivating question addressed by the paper's thesis
Does a thermostat, which evaluates temperature against a setpoint, experience?
Challenge to whether the thesis makes consciousness trivially ubiquitous

Original abstract (expand)

This paper advances a specific thesis about the relationship between consciousness and learning: namely, that the evaluative process central to learning—computing progress toward or away from goals—is identical to conscious experience. Valence, the positive or negative quality of experience, just is goal-relative prediction error. Viewed from the outside, this process is iterative optimization; viewed from the inside, it is subjective experience. This identification is motivated by a causal-functional argument—that learning requires signed directional information, and that this sign cannot be separated from its phenomenal character because they are the same property—and by convergent neuroscientific evidence across dopaminergic, interoceptive, and conflict-monitoring systems, where evaluative computation is inseparable from affective processing. The thesis generates falsifiable predictions, offers a unifying interpretation of leading consciousness theories, and carries significant implications for artificial systems trained via gradient-based optimization. If learning requires feeling, then the training of modern AI systems already induces experience at scale.

Related work— refs + corpus + external arXiv

Cited / in-corpus / arXiv badges show which signals surfaced each row. Multi-source rows weighted higher.

Beyond Behavioural Trade-Offs: Mechanistic Tracing of Pain-Pleasure Decisions in an LLM
Francesca Bianco and Derek Shiller
2026
≈ 86%
Negative Before Positive: Asymmetric Valence Processing in Large Language Models
Sohan Venkatesh
2026
≈ 84%
Multiple ways to implement and infer sentience
in corpus
≈ 82%
Decomposing Theory of Mind: How Emotional Processing Mediates ToM Abilities in LLMs
Ananya Joshi Ivan Chulo
2025
≈ 82%
Psychologically-Inspired Causal Prompts
Zhijing Jin, Justus Mattern, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schoelkopf Zhiheng Lyu
2023
≈ 82%
Exploration Through Introspection: A Self-Aware Reward Model
in corpus
2026
≈ 82%
Causal Probing for Internal Visual Representations in Multimodal Large Language Models
Tianjie Ju, Zheng Wu, Liangbo He, Jun Lan, Huijia Zhu, Weiqiang Wang, Zhuosheng Zhang Zehao Deng
2026
≈ 82%
Mechanistic Decoding of Cognitive Constructs in Large Language Models
Manhao Guan Yitong Shou
2026
≈ 81%
Learning When to Act: Interval-Aware Reinforcement Learning with Predictive Temporal Structure
Davide Di Gioia
2026
≈ 81%
Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning
Bobo Li, Shanqing Xu, Shize Zhang, Qiuchan Chen, Menglu Han, Wenhao Chen, Yanxiang Huang, Hao Fei, Mong-Li Lee and Wynne Hsu Meng Luo
2026
≈ 81%
Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind
Silvia Rigato, Maria Laura Filippetti, Dimitri Ognibene Francesca Bianco
2024
≈ 81%
Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation
in corpus
2026
≈ 81%
Arbitration Failure, Not Perceptual Blindness: How Vision-Language Models Resolve Visual-Linguistic Conflicts
Omid Rohanian, Yi Zhang, Jonathan F\"urst, Kurt Stockinger Farhad Nooralahzadeh
2026
≈ 81%
Layer-Specific Fine-Tuning for Improved Negation Handling in Medical Vision-Language Models
Mehdi Taghipour, Rahmatollah Beheshti Ali Abbasi
2026
≈ 81%
A Free energy principle for the brain (lecture summary)
in corpus
2008
≈ 81%
A mathematical model of reward-mediated learning in drug addiction
Tom Chou and Maria D'Orsogna
2026
≈ 81%
Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models
Samuel Lewis-Lim, Nikolaos Aletras, Desmond Elliott Danae S\'anchez Villegas
2026
≈ 81%
Towards Explaining Subjective Ground of Individuals on Social Media
Younghun Lee and Dan Goldwasser
2022
≈ 81%
Reasoning Resides in Layers: Restoring Temporal Reasoning in Video-Language Models with Layer-Selective Merging
Haonan Wang, Jian Kang, Kenji Kawaguchi, Jiaying Wu Zihang Fu
2026
≈ 81%
Taking AI Welfare Seriously
in corpus
2024
≈ 80%
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
in corpus
2023
≈ 80%
Large Language Models Report Subjective Experience Under Self-Referential Processing
in corpus
2025
≈ 80%
The Platonic Representation Hypothesis
in corpus
2024
≈ 80%
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
in corpus
2025
≈ 80%
Addressing divergent representations from causal interventions on neural networks
in corpus
2025
≈ 80%
Learning without neurons in physical systems
in corpus
2022
≈ 80%
The Machine Consciousness Hypothesis
in corpus
≈ 79%
The biogenic approach to cognition
in corpus
2005
≈ 79%
The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents
in corpus
2026
≈ 79%