An artificial intelligence (AI) system has been successful in mastering classic video games since the 1980s, with iconic Atari titles such as Montezuma Revenge, Pitfall and Freeway. According to its creators, AI-based algorithms could one day used to help robots navigate real-world environments, such as disaster areas. Like disaster zones, many “hard-exploration” games present a series of obstacles that must avoided and the paths that must taken to reach the destination. Previous attempts to create AI capable of solving such games have failed due to the complexity of free search.
For example, many AIIs use re-application learning – which involves performing successful actions to accomplish a task. The problem with this method is that rewards are very rare, making it difficult for any system to achieve its purpose. For example, if a robot needs to perform a series of complex actions to reach a certain destination and rewarded only after reaching its destination, it receives no response to the many individual steps that must taken along that path. Researchers may offer more “dense” rewards – as if a robot rewards every step it takes – but then it creates a byline for its goal and fails to avoid any setbacks along the way.
The only way to solve this is to create an AI that can actively track its environment. However, for writing in the journal Nature, the makers of this new AI explained, “two major issues hindered the exploration of previous algorithms.” The first of these known as isolation, which occurs when a system does not keep a record of the neglected areas it explores. For example, when a robot reaches a fork in the road, it must choose one path and discard the other. Isolation refers to the inability of a system to remember later that there an alternative path that might still be worth exploring.
Even if an AI can recall such lost opportunities, it will turn into a problem called tranquilization, through which it becomes constantly side by side with its own motivation to continue exploring. Investigates side-by-side and therefore never returns it to thorns. To overcome all of this, researchers have created a “family of algorithms” that they call Go-Explore. In short, the system works by constantly archiving each state in front of it, allowing it to remember the paths dropped at each point in the video game. It will then be able to return immediately to any one of these promised reserved states, thus overcoming both isolation and derailment.
As a result, Go-Explorer was able to surpass Pitfall’s average human score, a game where previous algorithms failed to score any points. It also scored 1.7 million in Montezuma’s revenge, breaking the world record of 1.2 million points.