What can be learned from computers playing Pac-Man

Self-learning computer programmes can do a great deal today: predict the weather, discover tumours in X-rays, play chess better than any human being. How the algorithms draw their conclusions, however, is often not even known by those who programmed them. Researchers at the University of Augsburg and the Israel Institute of Technology (Technion) have now compared two approaches to shed some light on this "black box". The study shows what information helps users to assess the quality of artificial intelligence processes. It was published in the journal Artificial Intelligence, one of the most internationally prestigious specialist journals in AI research.

At the World Chess Championship in Dubai, Norwegian Magnus Carlsen recently made a sacrifice of a pawn that surprised many observers - also because his move had apparently never been played in a big game before. When computers compete against each other, however, it appears from time to time. Carlsen is known to adjust his game based on the insights gained from these clashes with machines. In an interview he once described the AlphaZero chess algorithm as his idol. AlphaZero is a self-learning programme that has acquired enormous skill levels in millions of duels against itself. But even experts can scarcely figure out why it decides as it does in which situations.

Example of an attention card: The agent focuses mainly on the areas highlighted in yellow. In this case, he focuses his attention mainly on the Pac-Man figure, since it is not threatened by danger. © University of Augsburg

This is a general problem of artificially intelligent (abbreviated to AI) processes - even their programmers usually do not know how they come to their conclusions. Their behaviour is a black box, and the bigger the tasks that we entrust to the programmes, the more this becomes a problem. Who wants to blindly trust a machine in making a life and death decision? And how can one judge which of several algorithms is best suited for a task? An important concern of AI research at the moment is therefore to shed light on this black box. However, this task is anything but trivial and the scientific community has been preoccupied with it for many years. The new study now brings research a great step forward. Tobias Huber, Katharina Weitz and Elisabeth André from the University of Augsburg have teamed up with researcher Ofra Amir from the Israel Institute of Technology (Technion). The problem on which they trained their self-learning process was also a game - not chess, though, but Pac-Man.

Eating biscuits for research

Pac-Man is a Japanese computer game that began its triumphal march around the world in 1980. "It is one of the most difficult arcade games for an AI," explains Tobias Huber, who is completing his doctorate under Prof. André at the Chair for Human-Centred Artificial Intelligence. The game character has to eat biscuits in a maze and is pursued by ghosts. She gets points for each biscuit; if she is caught, she dies. Similar to chess, the game is therefore ideally suited for a special category of AI algorithms, namely those that learn through reinforcement. "We let our programme play Pac-Man thousands of times in a row," says Huber.

In this image, Pac-Man is running back and forth in the lower left corner. The yellow background shows that the agent is paying attention to the ghosts in the upper right corner. This indicates that he is "afraid" of the ghosts and therefore lets Pac-Man stay on the other side of the maze. However, only a few participants in the study noticed this. The reason could be that the attention cards are only seen briefly within a video, which makes their interpretation difficult. © University of Augsburg

“The better the strategy, the more points they score. Based on its previous experience, the algorithm learns over time how it should behave in which situation.” But how can an observer judge the criteria on which the AI’s behaviour is based and how good its decisions are? To assess this, the researchers devised a simple experiment. First, they trained the computer to play Pac-Man, but secretly modified the rules according to which the points were awarded. In one case, for example, the character did not lose any points when it died. The algorithm trained in this way (the researchers also refer to it as an “agent”) was therefore not impressed by any ghosts nearby when making its decisions. For a second agent, they changed the value of the biscuits; a third, on the other hand, played according to the normal rules. "We have now asked test subjects to assess the three agents," Huber explains. “They were not allowed to watch several complete games though, but were only shown brief excerpts.” On this basis, the test subjects were asked to indicate which of the agents they would most likely allow to compete for them in a Pac-Man game. They were also asked to briefly describe the strategy of all three AI processes in their own words. “We wanted to find out whether the test subjects had understood why the algorithm was performing certain actions,” says the computer scientist.

Summary of the "most dramatic" game scenes helps the most

To this end, the participants were divided into four groups. Each was allowed to look at five three-second excerpts from the games of the three agents. For the first group, these short clips were chosen at random. For the second group, a kind of “attention map” was also inserted into the random short clips. It showed which influences in its environment the agent was paying particular attention to at that moment. The third group, on the other hand, saw the most “dramatic” game scenes - those in which the agent's decision had a particularly large impact (for example, could lead to the death of the game character or a particularly high points score). In AI research, this is also referred to as a “strategy summary”. In the fourth group, this summary was supplemented by the attention map. The result of the experiment was clear. "Test subjects who saw the summary were most likely to develop a feeling for the strategy of the respective agent as a result of this," explains Huber. “The attention cards, on the other hand, were of significantly less help to them. Even in combination with the summary, they only gave rise to a small additional benefit.” It was, he said, cognitively very demanding to look at the game excerpts and at the same time to watch out for the information from the attention cards. “We assume that their contribution would be greater if the information were better presented.” The researchers now want to investigate how the attention cards could be optimised in such a way that, together with the strategy summary, they make an AI agent’s decisions even more comprehensible.

Publication

Tobias Huber, Katharina Weitz, Elisabeth André, Ofra Amir: Local and global explanations of agent behavior: Integrating strategy summaries with saliency maps. Artificial Intelligence, Volume 301, 2021, https://doi.org/10.1016/j.artint.2021.103571.

Scientific Contact

Prof. Dr. Tobias Huber

former Research assistant

Chair for Human-Centered Artificial Intelligence

Phone:

Homepage:

Email: tobias.huber@thithi.de ()

Media contact

Corina Härning

Deputy Media Officer

Communications and Media Relations

Phone: + 49 821 598-2098
Email: corina.haerning@presse.uni-augsburgpresse.uni-augsburg.de ()

Room 3002a

Contact (.vcf)

University of Augsburg

Was man von Pac-Man-spielenden Computern lernen kann

Studie der Universität Augsburg zeigt, welche Informationen Anwendern helfen, die Qualität selbstlernender Algorithmen zu beurteilen

Eating biscuits for research

Summary of the "most dramatic" game scenes helps the most

Publication

Scientific Contact

Media contact

University of Augsburg

Was man von Pac-Man-spielenden Computern lernen kann

Studie der Universität Augsburg zeigt, welche Informationen Anwendern helfen, die Qualität selbstlernender Algorithmen zu beurteilen

Eating biscuits for research

Summary of the "most dramatic" game scenes helps the most

Publication

Scientific Contact

Media contact

Search