Hide-and-seek game does not look like scientific research. But by simulating billions of hide-and-seek games, researchers have shown that artificial intelligence can learn complex strategies. And literally out of nowhere. OpenAI scientists consider this a perfect demonstration of artificial evolution.
Remember this when you first played hide and seek? Maybe you were two or three years old, maybe even less. Probably no one has explained the rules to you either. Before the piccolo, the piccolo, I’m coming! You hid around the corner, they found you right away, and that was it.
That wasn’t your last game, of course. With each repetition you found better hiding places, better strategies, maybe you even whispered with the other players to keep everyone from hiding in one place. You learned from your mistakes. And without noticing it, you learned a lot about physics, camouflage, yourself and others.
Artificial evolution in computer simulation
- A similar learning process experienced by all of us has now been simulated by OpenAI researchers on a computer. They programmed a simple environment in which independent agents move and have at their disposal:
- The decision algorithm takes into account: position, position of surrounding objects, position,…
- The decision algorithm takes into account: position, position of surrounding objects, position of other characters, position of ramps and boxes, viewing angle, etc.
- sight – see what is ahead of them, unless it is beyond the obstacle orientation in space – each character has a kind of “radar”, according to which he knows how far the nearest obstacle
- Moving Objects – Characters can catch, move and drop an object locks – characters can lock objects (such team can not move such object)
And that was all. Then they just divided the virtual characters into teams (blue hide-and-seekers) and assigned them incentives in the form of scoring. Seekers were awarded points for finding hide-and-seekers and, on the other hand, hide-and-seekers were virtually rewarded for not being found.
The rest was done by machine learning with feedback. Of course repeated hundreds, thousands, millions of times. And where there was nothing at the beginning, intelligent behavior began to emerge.
From confusion to teamwork
At the beginning, the algorithm did not know what its target was, so the characters of the hide-and-seekers did not hide at all, just ran randomly. The seekers soon discovered them.
However, since the algorithm got penalty points when his character was discovered, he “involuntarily” started looking for strategies to avoid penalty points and hence discovery. Or, to put it another way, those versions of the algorithm that received less penalty points had a better chance of replicating and enforcing themselves in the next generation. Therefore, researchers call this process “artificial evolution” because it resembles the way of natural selection that we know from biology.
It took two and a half million generations to discover the first successful strategy, and that was escape. The blue figures kept track of where the red seekers were and tried to run, even hiding behind objects and walls in the environment.
But such a basic tactic does not help much in a small space. Using machine learning and a large number of repetitions, the hide-and-seek algorithm gradually developed the first real and effective strategy: the characters used objects in the room and barricaded themselves in front of the seekers during the dig, ie during their initial lead.
But the search engine algorithm responded to this hiding strategy. Over the next million repetitions, seekers have discovered a trial-and-error way to use the available ramps to overcome barricades. The seekers began to have – literally – the upper hand.
But not for long. The hide-and-seekers learned to hide the sloping platform by trial and error, thus taking the opportunity for the seekers to overcome the barricade. Even the hide-and-seekers learned to cooperate and split their roles to barricade. Similar behaviour – spontaneous co-ordination between multiple actors – has been observed by researchers from the competing DeepMind team.
The development is more or less over. Hideers learned a strategy that was invincible in the long run. It took 14.5 million reps, and by the end of the experiment they had only slightly improved this winning tactic and worked hard to coordinate. Looking at the result, it is important to realize: the programmers did not give the characters any instructions or goals. They “came by themselves” for everything.
Beware of errors, the computer can use them
Even more interesting were the results in an open environment, where there was more space, more objects and in some variants even more characters. This also meant more repetitions of the order. Barricading appeared after 22 million generations.
Hideaways learned to barricade using long obstacles. The searchers responded by learning to use ramps, and the hideers locked the ramps before barricading.
Originally, scientists expected this strategy to be the final one. But the algorithm surprised the developer. He found unintentional errors in the simulation environment. After an unimaginable 388 million repetitions, the seekers found that they could climb an obstacle and then surf until they found hiding players.
The hide-and-seekers have found a trick to throw the ramp off the map and take the searchers an important tool. In the end, however, the resulting strategy in an open environment was that where the hideers first inspect the space, lock all objects, and then barricade themselves in front of the seekers. This strategy lasted until the end of testing that scientists have completed after 481 million generations.
The game variant (not captured in the video) was also funny, where the researchers tried to motivate the characters to move by using “food” placed on the desktop. After many generations, the hide-and-seekers have developed a tactic of ‘shifting’ the pecking hide-and-seekers into the corner where they are trapped by a barricade. The hideers then had food leftover from the map. The researchers did not hide their amazement: “We were completely surprised. It shows how complex behavior can occur through self-learning. ”
“We have found that algorithms are very agile in finding and abusing small inaccuracies in a simulated environment,” the researchers warn at the end of the work. “You need to pay attention to how to create an environment that does not suffer from errors and does not encourage this unwanted behavior.” Otherwise, the algorithm will learn to exploit these errors and its strategies will be unusable in practice.
Application in practice is of course the golden grail of all similar projects. The traditional lesson “hard on the training ground, light on the battlefield” gets extreme dimensions in the concept of machine learning. The practice can be repeated thousands, millions, billion times in various combinations and scenarios. Algorithms can compete, strengthen their reinforcement learning, and the best get the opportunity to advance to the next generation.
“Our results should show that multi-actor cooperation can be self-learning and that the resulting behavior can be used in the real environment,” the researchers conclude. They concede that hide and seek is a very primitive game, but they hope that further experiments will show more complex behavior in more complex situations. Researchers now want to apply similar learning to other real-world tasks.
We hope that there will be no situation where we hide and barricade ourselves from autonomous robots in the real world. Even so, it is becoming increasingly clear that we will not hide from machine learning in the future.