Equipping artificial intelligence with motivation
By April Cashin-Garbutt
In December 2017, Deepmind’s AlphaZero shocked the world by teaching itself the rules of chess in under four hours to beat the world-champion chess program Stockfish 8.1 We know that algorithms can be exceptional at solving individual tasks when given specific goals, such as winning a game of chess, but one thing they currently lack is intrinsic motivation.
This lack of motivation is a problem for artificial intelligence as it makes flexible behaviour difficult. In the real world our priorities are constantly changing as we go through different motivational states throughout the day. For example, at lunchtime our priorities shift from working to seeking food. This change in goal however provides a completely different problem for an algorithm to solve.
Imagine four priorities: eating, drinking, playing and sleeping. One solution is for an algorithm to learn a set of actions for all four priorities in every possible state of the world. This is extremely expensive as the algorithm has to learn about the world four times.
In a recent Frontiers in Systems Neuroscience paper, Shuvaev et al. implemented a new reinforcement learning algorithm with motivation and showed that it could infer motivational states and behave accordingly. Furthermore, they compared the algorithm to neural data from the Stephenson-Jones Lab and found that it worked in a similar way.
In the brain there is a region called the ventral pallidum (VP) which is responsible for motivation and drives the desire to eat, drink, sleep and so forth. Damage to VP can lead to apathy, a lack of motivation to do anything. Conversely, an overactive VP can cause addiction.
Previous research by Stephenson-Jones showed there are two major populations of neurons in VP that encode motivation: positive, a drive to pursue a reward (approach), and negative, a drive to evade a threat (avoid). The VP uses these two major populations of neurons to process the outcome of the motivational system. The inputs, i.e. the priorities, are encoded upstream of VP, and the outputs, i.e. the actions to fulfil the desire, are encoded downstream of VP.
Shuvaev et al. found that their artificial neural network model solved the problem in the same way as the brain by representing both positive and negative states in a hierarchical system. By tweaking the learning algorithm to add motivation as a new parameter to the neural network, the algorithm didn’t have to relearn the world each time but instead could flexibly adapt its behaviour depending on the goal.
By equipping artificial intelligence with motivation, the hope is that one day we will be able to create systems with general intelligence that can behave dynamically and perform multiple competing tasks like humans. While this goal is still a long way off, Shuvaev et al. show that motivation could be key to solving this problem.