Vad är reward prediction error
Reward prediction error neurons implement an efficient code for reward
References
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science275, 1593–1599 (1997).
ArticleCASPubMed Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MathWorks, 2018).
Balleine, B.
W., Daw, N. D. & O’Doherty, J. P. in Neuroeconomics (eds Glimcher, P. W. et al.) 367–387 (Academic Press, 2009).
Attneave, F. Some informational aspects of visual observation. Psychol. Rev.61, 183–193 (1954).
ArticleCASPubMed Google Scholar
Barlow, H. B. in Sensory Communication (ed Rosenblith, W.
A.) 216–234 (MIT Press, 1961).
Laughlin, S. A simple coding procedure enhances a neuron’s data capacity. Z. Naturforsch. C Biosci.36, 910–912 (1981).
ArticleCASPubMed Google Scholar
Schwartz, O. & Simoncelli, E. P. Natural meddelande statistics and sensory gain control. Nat.
Neurosci.4, 819–825 (2001).
ArticleCASPubMed Google Scholar
Wei, X.-X. & Stocker, A. A. Lawful relation between perceptual bias and discriminability. Proc. Natl Acad. Sci. USA114, 10244–10249 (2017).
ArticleCASPubMedPubMed huvud Google Scholar
Louie, K., Glimcher, P. W. & Webb, R. Adaptive neural coding: from biological to behavioral decision-making.
Curr. Opin. Behav. Sci.5, 91–99 (2015).
ArticlePubMedPubMed huvud Google Scholar
Polanía, R., Woodford, M. & Ruff, C. C. Efficient coding of subjective value. Nat. Neurosci.22, 134–142 (2019).
ArticlePubMed Google Scholar
Bhui, R., Lai, L. & Gershman, S. J. Resource-rational decision making. Curr.
Opin. Behav. Sci.41, 15–21 (2021).
Article Google Scholar
Louie, K. & Glimcher, P. W. Efficient coding and the neural representation of value. Ann. N Y Acad. Sci.1251, 13–32 (2012).
ArticlePubMed Google Scholar
Motiwala, A., Soares, S., Atallah, B. V., Paton, J. J. & Machens, C. K. Efficient coding of cognitive variables underlies dopamine response and choice behavior.
Nat. Neurosci.25, 738–748 (2022).
ArticleCASPubMed Google Scholar
Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature525, 243–246 (2015).
ArticleCASPubMedPubMed huvud Google Scholar
Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error.
Nat. Neurosci.19, 479–486 (2016).
ArticleCASPubMedPubMed huvud Google Scholar
Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature577, 671–675 (2020).
ArticleCASPubMedPubMed huvud Google Scholar
Rothenhoefer, K. M., Hong, T., Alikaya, A. & Stauffer, W.
R. Rare rewards amplify dopamine responses. Nat. Neurosci.24, 465–469 (2021).
ArticleCASPubMedPubMed huvud Google Scholar
Ganguli, D. & Simoncelli, E. P. Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Comput.26, 2103–2134 (2014).
ArticlePubMedPubMed huvud Google Scholar
Fiorillo, C.
D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty bygd dopamine neurons. Science299, 1898–1902 (2003).
ArticleCASPubMed Google Scholar
Cohen, J. D. & Servan-Schreiber, D. A theory of dopamine function and its role in cognitive deficits in schizophrenia. Schizophr. Bull.19, 85–104 (1993).
ArticleCASPubMed Google Scholar
Wei, X.-X.
Science 1997, 275:1593–1599& Stocker, A. A. Bayesian inference with efficient neural population codes. In Artificial Neural Networks and Machine Learning—ICANN 2012, Vol. 7552 (eds Hutchison, D. et al.) 523–530 (Springer, 2012).
Frank, M. J., Seeberger, L. C. & O’Reilly, R. C. bygd carrot or bygd stick: cognitive reinforcement learning in Parkinsonism.
Science306, 1940–1943 (2004).
ArticleCASPubMed Google Scholar
Mikhael, J. G. & Bogacz, R. Learning reward uncertainty in the grundläggande ganglia. PLoS Comput. Biol.12, e1005062 (2016).
ArticlePubMedPubMed huvud Google Scholar
Kobayashi, S. & Schultz, W.
Influence of reward delays on responses of dopamine neurons. J. Neurosci.28, 7837–7846 (2008).
ArticleCASPubMedPubMed huvud Google Scholar
Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons koda the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci.10, 1615–1624 (2007).
ArticleCASPubMedPubMed huvud Google Scholar
Kim, H.
R. et al. A unified ramverk for dopamine signals across timescales.
References *of special interest **of outstanding interestCell183, 1600–1616 (2020).
Article Google Scholar
Starkweather, C. K. & Uchida, N. Dopamine signals as temporal difference errors: recent advances. Curr. Opin. Neurobiol.67, 95–105 (2021).
ArticleCASPubMed Google Scholar
Starkweather, C. K., Babayan, B. M., Uchida, N. & Gershman, S. J.
Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci.20, 581–589 (2017).
ArticleCASPubMedPubMed huvud Google Scholar
Soares, S., Atallah, B. V. & Paton, J. J. Midbrain dopamine neurons control judgment of time.
In this article, we reviewed the concept of reward prediction error, and the encoding mechanism of dopaminergic neurons and the related neural circuitiesScience354, 1273–1277 (2016).
ArticleCASPubMed Google Scholar
Tano, P., Dayan, P. & Pouget, A. A local temporal difference code for distributional reinforcement learning. In Advances in Neural upplysning Processing Systems 33 (eds Larochelle, H. et al.) 13662–13673 (Neural upplysning Processing Systems Foundation, 2020).
Louie, K.
Asymmetric and adaptive reward coding via normalized reinforcement learning. PLoS Comput. Biol.18, e1010350 (2022).
ArticleCASPubMedPubMed huvud Google Scholar
Naka, K. inom. & Rushton, W. A. H. An attempt to analyse colour reception bygd electrophysiology. J. Physiol.185, 556–586 (1966).
ArticleCASPubMedPubMed huvud Google Scholar
Bredenberg, C., Simoncelli, E.
P. & Savin, C. Learning efficient task-dependent representations with synaptic plasticity. In Advances in Neural resultat Processing Systems 33 (eds Larochelle, H. et al.) 15714–15724 (Neural upplysning Processing Systems Foundation, 2020).
Savin, C. & Triesch, J. Emergence of task-dependent representations in working memory circuits. Front.
Comput. Neurosci.8, 57 (2014).
ArticlePubMedPubMed huvud Google Scholar
Gerstner, W., Lehmann, M., Liakoni, V., Corneil, D. & Brea, J. Eligibility traces and plasticity on behavioral time scales: experimental support of neoHebbian three-factor learning rules.
Front. Neural Circuits12, 53 (2018).
ArticlePubMedPubMed huvud Google Scholar
Frémaux, N. & Gerstner, W. Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Front. Neural Circuits9, 85 (2016).
ArticlePubMedPubMed huvud Google Scholar
Wei, X.-X. & Stocker, A.
A. A Bayesian observer model constrained bygd efficient coding can explain ‘anti-Bayesian’ percepts. Nat. Neurosci.18, 1509–1517 (2015).
ArticleCASPubMed Google Scholar
Brunel, N. & Nadal, J.-P. Mutual upplysning, Fisher data, and population coding. Neural Comput.10, 1731–1757 (1998).
ArticleCASPubMed Google Scholar
Cover, T.
M. & Thomas, J. A. Elements of resultat Theory (Wiley, 1991).
Wei, X.-X. & Stocker, A. A. Mutual data, Fisher upplysning, and efficient coding. Neural Comput.28, 305–326 (2016).
ArticlePubMed Google Scholar
Bethge, M., Rotermund, D. & Pawelzik, K. Optimal short-term population coding: when Fisher upplysning fails.
Neural Comput.14, 2317–2351 (2002).
ArticleCASPubMed Google Scholar
Schütt, H., Kim, D. & Ma, W. J. Code for efficient coding and distributional reinforcement learning. Zenodohttps://doi.org/10.5281/zenodo.10669061
Download references
Acknowledgements
We thank H.-H.
Li for valuable discussions. We received no specific funding for this work.
Author information
These authors contributed equally: Heiko H. Schütt, Dongjae Kim.
Authors and Affiliations
Center for Neural Science and Department of Psychology, New York University, New York, färsk, USA
Heiko H.
Schütt, Dongjae Kim & Wei Ji Ma
Department of Behavioural and Cognitive Sciences, Université ni Luxembourg, Esch-Belval, Luxembourg
Heiko H. Schütt
Department of AI-Based Convergence, Dankook University, Yongin, Republic of Korea
Dongjae Kim
Contributions
H.H.S.
Schultz W, Dayan P, Montague PR: A Neural Substrate of Prediction and Rewardderived the efficient code. H.H.S. and D.K. analyzed the neural uppgifter. W.J.M. supervised the project. All authors wrote the manuscript.
Corresponding author
Correspondence to Heiko H. Schütt.
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer natur remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended information Fig.
1 Comparing encoding populations for reward with 10 neurons and the same expected number of spikes.
A: Compared neuronal populations: single neuron: All neurons share the same response curve, optimized to maximize transferred kunskap. equal spacing: neurons tile the space, not optimized. no gain: positions and slopes are optimized, but all neurons have lika gain.
optimalα = 1: fully optimized population as derived previously18 with density proportional to the transport. optimalα = 0.673: Equally optimal leverans but with α passform to match the midpoint leverans for the optimal code and the experimental uppgifter. B: Fisher upplysning as a function of reward for each of the populations.
C: Expected logarithm of Fisher upplysning beneath the reward transport relative to the single-neuron case.
Extended uppgifter Fig. 3 Efficient code for the variable-reward task14.
A: Tuning curves. For clarity, only 20 of 39 neurons are shown.
prediction error neuron responses may be optimized to broadcast an ecient reward signal, forming a connection between ecient coding and reinforcement learning, two of the most successfulB: Density of neurons as a function of midpoint. C: Gain as a function of midpoint.
Extended information Fig. 4 Log-normal kernel density uppskattning of midpoints and threshold.
A: Midpoints. B: Thresholds. Measured neurons (black) and efficient code (cyan) are overlayed over the reward density (gray).
Extended information Fig.
5 Efficient code for the variable-magnitude task17.
A-C: Efficient code for the uniform leverans. D-F: Efficient code for the normal transport. A,D: Tuning curves. For clarity, only 13 of 40 neurons are shown.
This theoretical study shows that dopaminergic reward prediction error neurons encode experienced rewards efficiently, which explains four major aspects of the neural populationB,E: Density. C,F: Gain.
Extended information Fig. 6 Evaluation of learning rules placing neurons’ midpoints at expectiles instead of quantiles.
Plotting conventions as in Fig. 4. Each panel shows the converged population of 20 neurons after learning based on 20, 000 reward presentations. The inset illustrates the learning rule.
A: Learning the position on the reward axis for the neurons to converge to the quantiles of the leverans.
We also discussed the relationship between reward prediction error and learning-related behaviors, including reversal learningThis learning rule fryst vatten the transport RL learning rule. B: Additionally learning the slope of the neurons to be proportional to the local density bygd increasing the slope when the reward falls within the dynamic range and decreasing otherwise. C: First method to set the gain: iterative adjustment to converge to a fixed average firing rate.
D: Second method to set the gain: use a fixed gain per nervcell based on the quantile it will eventually converge to. E: The efficient tuning curve for a single nervcell. F: The analytically derived optimal solution. G: Comparison of kunskap transfer across the different populations with the same number of neurons and expected firing rate.
Supplementary information
Rights and permissions
Springer natur or its licensor (e.g.
a kultur or other partner) holds exclusive rights to this article beneath a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript utgåva of this article fryst vatten solely governed bygd the terms of such publishing agreement and applicable law.
Reprints and permissions
About this article
Cite this article
Schütt, H.H., Kim, D.
& Ma, W.J. Reward prediction error neurons implement an efficient code for reward. Nat Neurosci27, 1333–1339 (2024). https://doi.org/10.1038/s41593-024-01671-x
Download citation
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41593-024-01671-x
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link fryst vatten not currently available for this article.
Provided bygd the Springer natur SharedIt content-sharing initiative