Following their huge success in predicting protein folds in 2020, Google’s DeepMind has now released another AI that is less about solving complex biological problems, and less about dominating its opponents in strategy games – and it doesn’t even bother to read the rules. In a blog post describing their latest innovation, Deepmind shows off their MuZero machine-learning AI that can play multiple different games without setting rules and set record-breaking scores.
Combining game-playing AI’s previous iterations that they can plan before learning from their previous move, MuZero is able to create strategies because it plays while in a completely unfamiliar environment. Their discoveries were published in nature.
The authors state in the blog post, “lookahead search system systems like AlphaZero have had tremendous success in classic games such as checkers, chess, and poker, but the dynamics of their environment depend on knowledge of the rules of the game or an accurate simulator.” “This makes it difficult to apply them to chaotic real-world problems, which are usually as complex as rules and difficult to break with general rules.”
MuZero currently serves Atari parameters such as Go, Chess, shogi and Ms Pac-Man, but such advances in AI can have a tremendous impact on algorithms that can adapt without rulesets, a challenge that people face every day.
The AI works by utilizing 3 different parameters to create a game strategy:
- What is the next best step?
- How successful was the last action?
- How good is the current position?
Basically, AI separates the whole game into a single question and then points out how it goes further. It takes endless learning throughout the game to make this decision and the results are extremely impressive. In Atari suite benchmarks, MuZero set a new record for performance, outclassing all AI competitors. In chess, shogi and Go, MuZero matched the leading performance set by its’ younger AI sibling AlphaZero. It also showed interesting results when the number of simulations it was allowed to perform was increased.
MuZero performed better as the number of simulations planned per step increased, which enabled MuZero to perform and learn more effectively. MuZero will now continue his quest for total gaming dominance, but it will likely see many other uses in various scientific fields. AlphaZero is currently employed in complex applications, in which quantum speed can be optimized much faster than humans. Such an algorithm would be integral to creating robots that could deal with the real world instead of predefined roles with limited flexibility.