Abstract
Planning can be defined as action selection that leverages an internal model of the outcomes likely to follow each possible action. Its neural mechanisms remain poorly understood. Here we adapt recent advances from human research for rats, presenting for the first time an animal task that produces many trials of planned behavior per session, making multitrial rodent experimental tools available to study planning. We use part of this toolkit to address a perennially controversial issue in planning: the role of the dorsal hippocampus. Although prospective hippocampal representations have been proposed to support planning, intact planning in animals with damaged hippocampi has been repeatedly observed. Combining formal algorithmic behavioral analysis with muscimol inactivation, we provide causal evidence directly linking dorsal hippocampus with planning behavior. Our results and methods open the door to new and more detailed investigations of the neural mechanisms of planning in the hippocampus and throughout the brain.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
£14.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
£139.00 per year
only £11.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Change history
17 November 2017
In the version of this article initially published, the green label in Fig. 1c read "rightward choices" instead of "leftward choices." The error has been corrected in the HTML and PDF versions of the article.
References
Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, 1998).
Tolman, E.C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
Dolan, R.J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
Balleine, B.W. & O'Doherty, J.P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).
Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Brogden, W.J. Sensory pre-conditioning. J. Exp. Psychol. 25, 323–332 (1939).
Adams, C.D. & Dickinson, A. Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B 33, 109–121 (1981).
Hilário, M.R.F., Clouse, E., Yin, H.H. & Costa, R.M. Endocannabinoid signaling is critical for habit formation. Front. Integr. Neurosci. 1, 6 (2007).
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Simon, D.A. & Daw, N.D. Neural correlates of forward planning in a spatial decision task in humans. J. Neurosci. 31, 5526–5539 (2011).
Wunderlich, K., Dayan, P. & Dolan, R.J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
Huys, Q.J.M. et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. USA 112, 3098–3103 (2015).
O'Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press Oxford, 1978).
Packard, M.G. & McGaugh, J.L. Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiol. Learn. Mem. 65, 65–72 (1996).
Morris, R.G., Garrud, P., Rawlins, J.N. & O'Keefe, J. Place navigation impaired in rats with hippocampal lesions. Nature 297, 681–683 (1982).
O'Keefe, J. & Dostrovsky, J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).
Wikenheiser, A.M. & Redish, A.D. Hippocampal theta sequences reflect current goals. Nat. Neurosci. 18, 289–294 (2015).
Pfeiffer, B.E. & Foster, D.J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Koene, R.A., Gorchetchnikov, A., Cannon, R.C. & Hasselmo, M.E. Modeling goal-directed spatial navigation in the rat based on physiological data from the hippocampal formation. Neural Netw. 16, 577–584 (2003).
Foster, D.J. & Knierim, J.J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).
Pezzulo, G., van der Meer, M.A.A., Lansink, C.S. & Pennartz, C.M.A. Internally generated sequences in learning and executing goal-directed behavior. Trends Cogn. Sci. 18, 647–657 (2014).
Kimble, D.P. & BreMiller, R. Latent learning in hippocampal-lesioned rats. Physiol. Behav. 26, 1055–1059 (1981).
Kimble, D.P., Jordan, W.P. & BreMiller, R. Further evidence for latent learning in hippocampal-lesioned rats. Physiol. Behav. 29, 401–407 (1982).
Corbit, L.H. & Balleine, B.W. The role of the hippocampus in instrumental conditioning. J. Neurosci. 20, 4233–4239 (2000).
Corbit, L.H., Ostlund, S.B. & Balleine, B.W. Sensitivity to instrumental contingency degradation is mediated by the entorhinal cortex and its efferents via the dorsal hippocampus. J. Neurosci. 22, 10976–10984 (2002).
Ward-Robinson, J. et al. Excitotoxic lesions of the hippocampus leave sensory preconditioning intact: implications for models of hippocampal function. Behav. Neurosci. 115, 1357–1362 (2001).
Gaskin, S., Chai, S.-C. & White, N.M. Inactivation of the dorsal hippocampus does not affect learning during exploration of a novel environment. Hippocampus 15, 1085–1093 (2005).
Bunsey, M. & Eichenbaum, H. Conservation of hippocampal memory function in rats and humans. Nature 379, 255–257 (1996).
Dusek, J.A. & Eichenbaum, H. The hippocampus and memory for orderly stimulus relations. Proc. Natl. Acad. Sci. USA 94, 7109–7114 (1997).
Devito, L.M. & Eichenbaum, H. Memory for the order of events in specific sequences: contributions of the hippocampus and medial prefrontal cortex. J. Neurosci. 31, 3169–3175 (2011).
Jones, J.L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953–956 (2012).
McDannald, M.A., Lucantonio, F., Burke, K.A., Niv, Y. & Schoenbaum, G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J. Neurosci. 31, 2700–2705 (2011).
Gremel, C.M. & Costa, R.M. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nat. Commun. 4, 2264 (2013).
Miller, K.J., Brody, C.D. & Botvinick, M.M. Identifying model-based and model-free patterns in behavior on multi-step tasks. Preprint at http://www.biorxiv.org/content/early/2016/12/24/096339 (2016).
Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R.J. Model-based reasoning in humans becomes automatic with training. PLOS Comput. Biol. 11, e1004463 (2015).
Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLOS Comput. Biol. 7, e1002055 (2011).
Kool, W., Cushman, F.A. & Gershman, S.J. When does model-based control pay off? PLOS Comput. Biol. 12, e1005090 (2016).
Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLOS Comput. Biol. 11, e1004648 (2015).
Padoa-Schioppa, C. Neurobiology of economic choice: a good-based model. Annu. Rev. Neurosci. 34, 333–359 (2011).
Wilson, R.C., Takahashi, Y.K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Stalnaker, T.A., Cooch, N.K. & Schoenbaum, G. What the orbitofrontal cortex does not do. Nat. Neurosci. 18, 620–627 (2015).
Ostlund, S.B. & Balleine, B.W. Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental conditioning. J. Neurosci. 27, 4819–4825 (2007).
Foster, D.J., Morris, R.G. & Dayan, P. A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10, 1–16 (2000).
Olton, D.S., Becker, J.T. & Handelmann, G.E. Hippocampus, space, and memory. Behav. Brain Sci. 2, 313–322 (1979).
Racine, R.J. & Kimble, D.P. Hippocampal lesions and delayed alternation in the rat. Psychon. Sci. 3, 285–286 (1965).
Gilboa, A., Sekeres, M., Moscovitch, M. & Winocur, G. Higher-order conditioning is impaired by hippocampal lesions. Curr. Biol. 24, 2202–2207 (2014).
Solomon, P.R., Vander Schaaf, E.R., Thompson, R.F. & Weisz, D.J. Hippocampus and trace conditioning of the rabbit's classically conditioned nictitating membrane response. Behav. Neurosci. 100, 729–744 (1986).
Hartley, T., Lever, C., Burgess, N. & O'Keefe, J. Space in the brain: how the hippocampal formation supports spatial cognition. Phil. Trans. R. Soc. Lond. B 369, 20120510 (2013).
Hassabis, D., & Maguire, E.A. Deconstructing episodic memory with construction. Trends in Cog. Sci., 11, 299–306 (2007).
Eichenbaum, H. & Cohen, N.J. Can we reconcile the declarative memory and spatial navigation views on hippocampal function? Neuron 83, 764–770 (2014).
Lau, B. & Glimcher, P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).
Stan Development Team. MatlabStan: the MATLAB interface to Stan. Stan.org. http://mc-stan.org/users/interfaces/matlab-stan (2016).
Carpenter, C. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Gelman, A. et al. Bayesian Data Analysis, Third Edition (CRC Press, 2013).
Krupa, D.J., Ghazanfar, A.A. & Nicolelis, M.A. Immediate thalamic sensory plasticity depends on corticothalamic feedback. Proc. Natl. Acad. Sci. USA 96, 8200–8205 (1999).
Martin, J.H. Autoradiographic estimation of the extent of reversible inactivation produced by microinjection of lidocaine and muscimol in the rat. Neurosci. Lett. 127, 160–164 (1991).
Aarts, E., Verhage, M., Veenvliet, J.V., Dolan, C.V. & van der Sluis, S. A solution to dependency: using multilevel analysis to accommodate nested data. Nat. Neurosci. 17, 491–496 (2014).
Daw, N.D. in Decision Making, Affect, and Learning (eds. Delgado, M.R., Phelps, E.A. & Robbins, T.W.) 3–38 (Oxford University Press, 2011).
Duane, S., Kennedy, A.D., Pendleton, B.J. & Roweth, D. Hybrid Monte Carlo. Phys. Lett. B 195, 216–222 (1987).
Acknowledgements
We thank J. Erlich, C. Kopec, C.A. Duan, T. Hanks and A. Begelfer for training K.J.M. in the techniques necessary to carry out these experiments, as well as for comments and advice on the project. We thank N. Daw, I. Witten, Y. Niv, B. Wilson, T. Akam, A. Akrami and A. Solway for comments and advice on the project, and we thank J. Teran, K. Osorio, A. Sirko, R. LaTourette, L. Teachen and S. Stein for assistance in carrying out behavioral experiments. We especially thank T. Akam for suggestions on the physical layout of the behavior box and other experimental details. We thank A. Bornstein, B. Scott, A. Piet and L. Hunter for comments on the manuscript. K.J.M. was supported by training grant NIH T-32 MH065214 and by a Harold W. Dodds fellowship from Princeton University.
Author information
Authors and Affiliations
Contributions
K.J.M., M.M.B. and C.D.B. conceived the project. K.J.M. designed and carried out the experiments and the data analysis, with supervision from M.M.B. and C.D.B. K.J.M., M.M.B. and C.D.B. wrote the paper, starting from an initial draft by K.J.M.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Reward rates of model-based and model-free agents.
Reward rates achieved by synthetic datasets generated by a hybrid model-based/model-free agent. Data were generated under the constraints αplan=αMF, βplan+βMF=5, with λ, αT, and all other betas set to zero. The highest reward rates are achieved by purely model-based agents, but the best purely model-free agents still outperformed the average rat, earning around 58% rewards (rat’s reward rate mean: 56.8%, sem: 0.4%, std: 2%).
Supplementary Figure 2 Results of one-trial-back analysis applied to the behavioral dataset.
Above: Average and standard error of the stay probability across rats. Below: Stay probability for each rat, with binomial 95% confidence intervals
Supplementary Figure 3 Results of logistic regression analysis applied to each rat, as well as simulated data generated from a fit of the mixture model to that rat’s dataset.
Rats are ordered by the relative quality of fit of the mixture model with respect to the regression model - earlier rats datasets are better explained by the mixture model than the regression, while later rats are better explained by the regression model.
Supplementary Figure 4 Movement times are faster following common transition trials.
Median movement time, in seconds, from the bottom center port to the reward port for common and uncommon transition trials, broken down by whether the movement was towards (right panel) or away from (left panel) the port with the higher reward probability.
Supplementary Figure 5 Placement of cannula in individual rats.
Purple points indicate OFC cannula tips, green points indicate PL cannula tips, and orange points indicate dH cannula tips.
Supplementary Figure 6 Results of logistic regression analysis applied to the inactivation dataset.
Above: Regression coefficients for the Saline, Control, dH, and OFC conditions. Points are averages across rats, and error bars are standard errors. Below: Differences between regression coefficients for different conditions
Supplementary Figure 7 Results of logistic regression analysis applied to each rat in the inactivation dataset.
Note that rat #6 did not complete any saline sessions
Supplementary Figure 9 Rat performances compared between inactivation and control sessions.
Top: Fraction of times each rat selected the choice port with the greatest probability of leading to the reward port with the greatest probability of reward, for control vs. OFC sessions (Left) and for control vs. dH sessions (Right). Bottom: Fraction of times the better port was selected, as a function of the number of trials since the last reward probability flip.
Supplementary Figure 10 Results of one-trial-back analysis applied to the inactivation dataset.
Above: Average stay probability by trial-type for the Control, dH, and OFC conditions. Bar height is the average across rats, and error bars are standard errors. Below: Differences between stay probabilities coefficients for the different conditions
Supplementary Figure 12 Results of fitting the multiagent model jointly to the OFC inactivation and saline datasets.
Top Row: Posterior belief distributions over parameters governing the effect of inactivation on performance across the population. Only βplan is significantly affected by the inactivation. Below: Posterior belief distributions over parameters governing behavior on OFC (purple) and Saline (blue) sessions. Only βplan is affected by inactivation in a way that is consistent across animals.
Supplementary Figure 13 Results of fitting the multiagent model jointly to the dH inactivation and saline datasets.
Top Row: Posterior belief distributions over parameters governing the effect of inactivation on performance across the population. Only βplan is significantly affected by the inactivation. Below: Posterior belief distributions over parameters governing behavior on dH (orange) and Saline (blue) sessions. Only βplan is affected by inactivation in a way that is consistent across animals.
Supplementary Figure 15 Normalized cross-validated likelihood for logistic regression models (Online Methods), as a function of the number of previous trials used to predict the upcoming choice.
Including more than five previous trials in the model results in negligible improvements in quality of model fit.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–15 and Supplementary Discussion (PDF 4128 kb)
Rights and permissions
About this article
Cite this article
Miller, K., Botvinick, M. & Brody, C. Dorsal hippocampus contributes to model-based planning. Nat Neurosci 20, 1269–1276 (2017). https://doi.org/10.1038/nn.4613
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nn.4613
This article is cited by
-
The role of the human hippocampus in decision-making under uncertainty
Nature Human Behaviour (2024)
-
Model-free decision-making underlies motor errors in rapid sequential movements under threat
Communications Psychology (2024)
-
Bad habits–good goals? Meta-analysis and translation of the habit construct to alcoholism
Translational Psychiatry (2024)
-
Dopamine-independent effect of rewards on choices through hidden-state inference
Nature Neuroscience (2024)
-
An automated, low-latency environment for studying the neural basis of behavior in freely moving rats
BMC Biology (2023)