Humans and other mammals learn rapidly from limited experience. Such behavioural flexibility is thought to depend on rich internal models, which allow the implications of new information to be predicted rather than discovered through trial and error. Though we now have a reasonable mechanistic and computational account of simple stimulus-response learning, we lack a comparable understanding of model-based action selection. I will present work in mice addressing two aspects of model-based decision making; planning and generalisation.  

Planning, or model-based reinforcement learning (RL), is the use of an action-state transition model to improve behavioural policy. A challenge in studying planning is its coexistence with other learning systems, notably model-free RL, thought to underpin habitual behaviours. Tasks are required where planning’s contribution to behaviour can be isolated, preferably while generating many individual decisions. We use multi-step decision tasks where subjects navigate a decision tree to obtain rewards, implemented either as nose-poke tasks in operant boxes or route planning tasks in complex mazes. In such contexts mice readily show behaviour consistent with planning, thought there remains some ambiguity about whether the apparently sophisticated decision making may rely more on clever state representations than clever decision algorithms.  

Another component of behavioural flexibility is generalisation of prior knowledge to novel analogous situations. Generalisation requires abstracting the structure of tasks away from their specific sensory and motor correlates. To study generalisation we developed a novel behaviour in which mice serially perform a set of reversal learning tasks, which share the same structure but have different physical configurations. Performance improves with experience over multiple configurations, demonstrating generalisation. Single unit recordings in hippocampus and prefrontal cortex show a partial remapping between configurations, with some neurons gaining or losing firing fields when the configuration changes and others invariantly coding aspects of physical space or the task’s state space.