waybail.pages.dev









Vad är reward prediction error


  • vad är reward prediction error

  • Reward prediction error neurons implement an efficient code for reward

    References

    1. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science275, 1593–1599 (1997).

      ArticleCASPubMed Google Scholar

    2. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MathWorks, 2018).

    3. Balleine, B.

      W., Daw, N. D. & O’Doherty, J. P. in Neuroeconomics (eds Glimcher, P. W. et al.) 367–387 (Academic Press, 2009).

    4. Attneave, F. Some informational aspects of visual observation. Psychol. Rev.61, 183–193 (1954).

      ArticleCASPubMed Google Scholar

    5. Barlow, H. B. in Sensory Communication (ed Rosenblith, W.

      A.) 216–234 (MIT Press, 1961).

    6. Laughlin, S. A simple coding procedure enhances a neuron’s data capacity. Z. Naturforsch. C Biosci.36, 910–912 (1981).

      ArticleCASPubMed Google Scholar

    7. Schwartz, O. & Simoncelli, E. P. Natural meddelande statistics and sensory gain control. Nat.

      Neurosci.4, 819–825 (2001).

      ArticleCASPubMed Google Scholar

    8. Wei, X.-X. & Stocker, A. A. Lawful relation between perceptual bias and discriminability. Proc. Natl Acad. Sci. USA114, 10244–10249 (2017).

      ArticleCASPubMedPubMed huvud Google Scholar

    9. Louie, K., Glimcher, P. W. & Webb, R. Adaptive neural coding: from biological to behavioral decision-making.

      Curr. Opin. Behav. Sci.5, 91–99 (2015).

      ArticlePubMedPubMed huvud Google Scholar

    10. Polanía, R., Woodford, M. & Ruff, C. C. Efficient coding of subjective value. Nat. Neurosci.22, 134–142 (2019).

      ArticlePubMed Google Scholar

    11. Bhui, R., Lai, L. & Gershman, S. J. Resource-rational decision making. Curr.

      Opin. Behav. Sci.41, 15–21 (2021).

      Article Google Scholar

    12. Louie, K. & Glimcher, P. W. Efficient coding and the neural representation of value. Ann. N Y Acad. Sci.1251, 13–32 (2012).

      ArticlePubMed Google Scholar

    13. Motiwala, A., Soares, S., Atallah, B. V., Paton, J. J. & Machens, C. K. Efficient coding of cognitive variables underlies dopamine response and choice behavior.

      Nat. Neurosci.25, 738–748 (2022).

      ArticleCASPubMed Google Scholar

    14. Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature525, 243–246 (2015).

      ArticleCASPubMedPubMed huvud Google Scholar

    15. Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error.

      Nat. Neurosci.19, 479–486 (2016).

      ArticleCASPubMedPubMed huvud Google Scholar

    16. Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature577, 671–675 (2020).

      ArticleCASPubMedPubMed huvud Google Scholar

    17. Rothenhoefer, K. M., Hong, T., Alikaya, A. & Stauffer, W.

      R. Rare rewards amplify dopamine responses. Nat. Neurosci.24, 465–469 (2021).

      ArticleCASPubMedPubMed huvud Google Scholar

    18. Ganguli, D. & Simoncelli, E. P. Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Comput.26, 2103–2134 (2014).

      ArticlePubMedPubMed huvud Google Scholar

    19. Fiorillo, C.

      D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty bygd dopamine neurons. Science299, 1898–1902 (2003).

      ArticleCASPubMed Google Scholar

    20. Cohen, J. D. & Servan-Schreiber, D. A theory of dopamine function and its role in cognitive deficits in schizophrenia. Schizophr. Bull.19, 85–104 (1993).

      ArticleCASPubMed Google Scholar

    21. Wei, X.-X.

      Science 1997, 275:1593–1599

      & Stocker, A. A. Bayesian inference with efficient neural population codes. In Artificial Neural Networks and Machine Learning—ICANN 2012, Vol. 7552 (eds Hutchison, D. et al.) 523–530 (Springer, 2012).

    22. Frank, M. J., Seeberger, L. C. & O’Reilly, R. C. bygd carrot or bygd stick: cognitive reinforcement learning in Parkinsonism.

      Science306, 1940–1943 (2004).

      ArticleCASPubMed Google Scholar

    23. Mikhael, J. G. & Bogacz, R. Learning reward uncertainty in the grundläggande ganglia. PLoS Comput. Biol.12, e1005062 (2016).

      ArticlePubMedPubMed huvud Google Scholar

    24. Kobayashi, S. & Schultz, W.

      Influence of reward delays on responses of dopamine neurons. J. Neurosci.28, 7837–7846 (2008).

      ArticleCASPubMedPubMed huvud Google Scholar

    25. Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons koda the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci.10, 1615–1624 (2007).

      ArticleCASPubMedPubMed huvud Google Scholar

    26. Kim, H.

      R. et al. A unified ramverk for dopamine signals across timescales.

      References *of special interest **of outstanding interest

      Cell183, 1600–1616 (2020).

      Article Google Scholar

    27. Starkweather, C. K. & Uchida, N. Dopamine signals as temporal difference errors: recent advances. Curr. Opin. Neurobiol.67, 95–105 (2021).

      ArticleCASPubMed Google Scholar

    28. Starkweather, C. K., Babayan, B. M., Uchida, N. & Gershman, S. J.

      Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci.20, 581–589 (2017).

      ArticleCASPubMedPubMed huvud Google Scholar

    29. Soares, S., Atallah, B. V. & Paton, J. J. Midbrain dopamine neurons control judgment of time.

      In this article, we reviewed the concept of reward prediction error, and the encoding mechanism of dopaminergic neurons and the related neural circuities

      Science354, 1273–1277 (2016).

      ArticleCASPubMed Google Scholar

    30. Tano, P., Dayan, P. & Pouget, A. A local temporal difference code for distributional reinforcement learning. In Advances in Neural upplysning Processing Systems 33 (eds Larochelle, H. et al.) 13662–13673 (Neural upplysning Processing Systems Foundation, 2020).

    31. Louie, K.

      Asymmetric and adaptive reward coding via normalized reinforcement learning. PLoS Comput. Biol.18, e1010350 (2022).

      ArticleCASPubMedPubMed huvud Google Scholar

    32. Naka, K. inom. & Rushton, W. A. H. An attempt to analyse colour reception bygd electrophysiology. J. Physiol.185, 556–586 (1966).

      ArticleCASPubMedPubMed huvud Google Scholar

    33. Bredenberg, C., Simoncelli, E.

      P. & Savin, C. Learning efficient task-dependent representations with synaptic plasticity. In Advances in Neural resultat Processing Systems 33 (eds Larochelle, H. et al.) 15714–15724 (Neural upplysning Processing Systems Foundation, 2020).

    34. Savin, C. & Triesch, J. Emergence of task-dependent representations in working memory circuits. Front.

      Comput. Neurosci.8, 57 (2014).

      ArticlePubMedPubMed huvud Google Scholar

    35. Gerstner, W., Lehmann, M., Liakoni, V., Corneil, D. & Brea, J. Eligibility traces and plasticity on behavioral time scales: experimental support of neoHebbian three-factor learning rules.

      Front. Neural Circuits12, 53 (2018).

      ArticlePubMedPubMed huvud Google Scholar

    36. Frémaux, N. & Gerstner, W. Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Front. Neural Circuits9, 85 (2016).

      ArticlePubMedPubMed huvud Google Scholar

    37. Wei, X.-X. & Stocker, A.

      A. A Bayesian observer model constrained bygd efficient coding can explain ‘anti-Bayesian’ percepts. Nat. Neurosci.18, 1509–1517 (2015).

      ArticleCASPubMed Google Scholar

    38. Brunel, N. & Nadal, J.-P. Mutual upplysning, Fisher data, and population coding. Neural Comput.10, 1731–1757 (1998).

      ArticleCASPubMed Google Scholar

    39. Cover, T.

      M. & Thomas, J. A. Elements of resultat Theory (Wiley, 1991).

    40. Wei, X.-X. & Stocker, A. A. Mutual data, Fisher upplysning, and efficient coding. Neural Comput.28, 305–326 (2016).

      ArticlePubMed Google Scholar

    41. Bethge, M., Rotermund, D. & Pawelzik, K. Optimal short-term population coding: when Fisher upplysning fails.

      Neural Comput.14, 2317–2351 (2002).

      ArticleCASPubMed Google Scholar

    42. Schütt, H., Kim, D. & Ma, W. J. Code for efficient coding and distributional reinforcement learning. Zenodohttps://doi.org/10.5281/zenodo.10669061

    Download references

    Acknowledgements

    We thank H.-H.

    Li for valuable discussions. We received no specific funding for this work.

    Author information

    Author notes
    1. These authors contributed equally: Heiko H. Schütt, Dongjae Kim.

    Authors and Affiliations

    1. Center for Neural Science and Department of Psychology, New York University, New York, färsk, USA

      Heiko H.

      Schütt, Dongjae Kim & Wei Ji Ma

    2. Department of Behavioural and Cognitive Sciences, Université ni Luxembourg, Esch-Belval, Luxembourg

      Heiko H. Schütt

    3. Department of AI-Based Convergence, Dankook University, Yongin, Republic of Korea

      Dongjae Kim

    Contributions

    H.H.S.

    Schultz W, Dayan P, Montague PR: A Neural Substrate of Prediction and Reward

    derived the efficient code. H.H.S. and D.K. analyzed the neural uppgifter. W.J.M. supervised the project. All authors wrote the manuscript.

    Corresponding author

    Correspondence to Heiko H. Schütt.

    Ethics declarations

    Competing interests

    The authors declare no competing interests.

    Peer review

    Peer review information

    Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.

    Additional information

    Publisher’s note Springer natur remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

    Extended data

    Extended information Fig.

    1 Comparing encoding populations for reward with 10 neurons and the same expected number of spikes.

    A: Compared neuronal populations: single neuron: All neurons share the same response curve, optimized to maximize transferred kunskap. equal spacing: neurons tile the space, not optimized. no gain: positions and slopes are optimized, but all neurons have lika gain.

    optimalα = 1: fully optimized population as derived previously18 with density proportional to the transport. optimalα = 0.673: Equally optimal leverans but with α passform to match the midpoint leverans for the optimal code and the experimental uppgifter. B: Fisher upplysning as a function of reward for each of the populations.

    C: Expected logarithm of Fisher upplysning beneath the reward transport relative to the single-neuron case.

    Extended uppgifter Fig. 3 Efficient code for the variable-reward task14.

    A: Tuning curves. For clarity, only 20 of 39 neurons are shown.

    prediction error neuron responses may be optimized to broadcast an ecient reward signal, forming a connection between ecient coding and reinforcement learning, two of the most successful

    B: Density of neurons as a function of midpoint. C: Gain as a function of midpoint.

    Extended information Fig. 4 Log-normal kernel density uppskattning of midpoints and threshold.

    A: Midpoints. B: Thresholds. Measured neurons (black) and efficient code (cyan) are overlayed over the reward density (gray).

    Extended information Fig.

    5 Efficient code for the variable-magnitude task17.

    A-C: Efficient code for the uniform leverans. D-F: Efficient code for the normal transport. A,D: Tuning curves. For clarity, only 13 of 40 neurons are shown.

    This theoretical study shows that dopaminergic reward prediction error neurons encode experienced rewards efficiently, which explains four major aspects of the neural population

    B,E: Density. C,F: Gain.

    Extended information Fig. 6 Evaluation of learning rules placing neurons’ midpoints at expectiles instead of quantiles.

    Plotting conventions as in Fig. 4. Each panel shows the converged population of 20 neurons after learning based on 20, 000 reward presentations. The inset illustrates the learning rule.

    A: Learning the position on the reward axis for the neurons to converge to the quantiles of the leverans.

    We also discussed the relationship between reward prediction error and learning-related behaviors, including reversal learning

    This learning rule fryst vatten the transport RL learning rule. B: Additionally learning the slope of the neurons to be proportional to the local density bygd increasing the slope when the reward falls within the dynamic range and decreasing otherwise. C: First method to set the gain: iterative adjustment to converge to a fixed average firing rate.

    D: Second method to set the gain: use a fixed gain per nervcell based on the quantile it will eventually converge to. E: The efficient tuning curve for a single nervcell. F: The analytically derived optimal solution. G: Comparison of kunskap transfer across the different populations with the same number of neurons and expected firing rate.

    Supplementary information

    Rights and permissions

    Springer natur or its licensor (e.g.

    a kultur or other partner) holds exclusive rights to this article beneath a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript utgåva of this article fryst vatten solely governed bygd the terms of such publishing agreement and applicable law.

    Reprints and permissions

    About this article

    Cite this article

    Schütt, H.H., Kim, D.

    & Ma, W.J. Reward prediction error neurons implement an efficient code for reward. Nat Neurosci27, 1333–1339 (2024). https://doi.org/10.1038/s41593-024-01671-x

    Download citation

    • Received:

    • Accepted:

    • Published:

    • Issue Date:

    • DOI: https://doi.org/10.1038/s41593-024-01671-x

    Share this article

    Anyone you share the following link with will be able to read this content:

    Sorry, a shareable link fryst vatten not currently available for this article.

    Provided bygd the Springer natur SharedIt content-sharing initiative