Decisions take place in dynamic environments. The nervous system must continually learn the best actions to obtain rewards. In the theoretical framework of optimal control and reinforcement learning, behavioural policies are updated by feedback arising from errors in the predicted reward. These reward prediction errors have been mapped to dopamine neurons in the midbrain, but it is unclear how other decision variables are represented and modulated. We trained mice on a dynamic foraging task, in which they freely chose between two alternatives that delivered reward with changing probabilities. We found that most serotonin neurons in the dorsal raphe represented a quantity related to reward uncertainty over long timescales (tens of seconds), consistent with a modulatory signal used to adjust learning rates of ongoing decision variables in frontal cortex. Recordings from locus coeruleus norepinephrine neurons revealed two populations, including one that was excited by lack of reward, a key driver of policy learning. Our results provide quantitative links between activity of two key neuromodulators--serotonin neurons and norepinephrine neurons--and dynamic behaviour.
Jeremiah Cohen is an Associate Professor in the Department of Neuroscience at the Johns Hopkins University School of Medicine. Prior to his appointment, he completed postdoctoral work at Harvard University, a Ph.D. in Neuroscience from Vanderbilt University, and undergraduate degrees in Mathematics and Neuroscience from Brandeis University. His lab studies functions of neuromodulatory systems and forebrain networks for dynamic decision making.