MuZero: DeepMind’s New AI Mastered More Than 50 Games

Published by Jan Heaney on

MuZero: DeepMind’s New AI Mastered More Than 50 Games

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. Some papers come with an intense media campaign
and a lot of nice videos, and some other amazing papers are at the risk of slipping under the
radar because of the lack of such a media presence. This new work from DeepMind is indeed
absolutely amazing, you’ll see in a moment why, and is not really talked about. So in
this video, let’s try to reward such a work. In many episodes, you get ice cream for your
eyes, but today, you get ice cream for your mind. Buckle up. In the last few years, we have seen DeepMind’s
AI defeat the best Go players in the world, and after OpenAI’s venture in the game of
DOTA2, DeepMind embarked on a journey to defeat pro players in Starcraft 2, a real-time strategy
game. This is a game that requires a great deal of mechanical skill, split-second decision
making and we have imperfect information as we only see what our units can see. A nightmare
situation for any AI. You see some footage of its previous games here on the screen. And, in my opinion, people seem to pay too
much attention to how good a given algorithm performs, and too little to how general it
is. Let me explain. DeepMind has developed a new technique that
tries to rely more on its predictions of the future, and generalizes to many many more
games than previous techniques. This includes AlphaZero, a previous technique also from
them that was able to play Go, Chess, and Japanese Chess or Shogi as well and beat any
human player at these games confidently. This new method is so general, that it does
as well as AlphaZero at these games, however, it can also play a wide variety of Atari games
as well. And that is the key here: writing an algorithm that plays chess well has been
a possibility for decades. For instance, if you wish to know more, make sure to check
out Stockfish, which is an incredible open-source project and a very potent algorithm. However,
Stockfish cannot play anything else – whenever we look at a new game, we have to derive a
new algorithm that solves it. Not so much with these learning methods, that can generalize
to a wide variety of games! This is why I would like to argue that the generalization
capability of these AIs is just as important as their performance. In other words, if there
were a narrow algorithm that is the best possible Chess algorithm that ever existed, or a somewhat
below world-champion level AI that can play any game we can possibly imagine, I would
take the latter in a heartbeat. Now, speaking about generalization, let’s
see how well it does at these Atari games, shall we? After 30 minutes of time on each
game, it significantly outperforms humans on nearly all of these games, the percentages
show you here what kind of outperformance we are talking about. In many cases, the algorithm
outperforms us several times, and up to several hundred times. Absolutely incredible. As you see, it has a more than formidable
score on almost all of these games, and therefore it generalizes quite well. I’ll tell you
in a moment about the games it falters at, but for now, let’s compare it to three other
competing algorithms. You see one bold number per row, which always highlights the best
performing algorithm for your convenience. The new technique beats the others on about
66% of the games, including the Recurrent Experience Replay technique, in short, R2D2.
Yes, this is another one of those crazy paper names. And even when it falls short, it is
typically very close. As a reference, humans triumphed on less than 10% of the games. We still have a big fat zero on Pitfall and
Montezuma’s Revenge games. So why is that? Well, these games require long-term planning,
which is one of the more difficult cases for reinforcement learning algorithms. In an earlier
episode, we discussed how we can infuse an AI agent with a curiosity to go out there
and explore some more with success. However, note that these algorithms are more narrow
than the one we’ve been talking about today. So there is still plenty of work to be done,
but I hope you see that this is incredibly nimble progress on AI research. Bravo DeepMind!
What a time to be alive! This episode has been supported by Linode.
Linode is the world’s largest independent cloud computing provider. They offer affordable
GPU instances featuring the Quadro RTX 6000 which is tailor-made for AI, scientific computing
and computer graphics projects. Exactly the kind of works you see here in this series. If you feel inspired by these works and you
wish to run your experiments or deploy your already existing works through a simple and
reliable hosting service, make sure to join over 800,000 other happy customers and choose
Linode. To spin up your own GPU instance and receive a $20 free credit, visit or
click the link in the description and use the promo code “papers20” during signup.
Give it a try today! Our thanks to Linode for supporting the series and helping us make
better videos for you. Thanks for watching and for your generous
support, and I’ll see you next time!


RecklesFlam1ngo · January 7, 2020 at 5:08 pm


Mr Artificial Intelligence · January 7, 2020 at 5:10 pm

It seems like deepmind and open AI will be the big players in the quest towards AGI

QbsidianH20 · January 7, 2020 at 5:12 pm

Wow, never this early. thank you always for informing.

A B · January 7, 2020 at 5:13 pm

I like my own comments

A B · January 7, 2020 at 5:14 pm

Did you thumb up this video?

Yoddi · January 7, 2020 at 5:14 pm

Awesome stuff!

Marcus Diemand · January 7, 2020 at 5:15 pm

Generalized AI 😨

Vicente Vasquez · January 7, 2020 at 5:16 pm

What a time to be alive!

L Max · January 7, 2020 at 5:16 pm

Tenth !

Peter Wilkinson · January 7, 2020 at 5:22 pm

are there videos of it playing any of these games? I would love to watch it play 😀

FuZZbaLLbee · January 7, 2020 at 5:25 pm

Generalization is also very important for use of RL outside of gaming

Parsa Rahimi · January 7, 2020 at 5:27 pm

please stick to two minutes

Michael Pierce · January 7, 2020 at 5:29 pm

Have you looked at the Baritone AI for Minecraft? I don't think it's authors are publishing in papers but are distributing a lot of public threads. It uses a lot of exploration in order to learn behavior and there is a private thread of a semi related AI That's mastering PVP on anarchy servers

Pan Darius Kairos · January 7, 2020 at 5:30 pm

I want ice cream for my soul.

Shtev · January 7, 2020 at 5:32 pm

hey, big fan, have you seen this Spleeter thing? Its an AI that splits songs into parts (like vocals+accompaniment) IDK if there's a paper on it but it's real damn cool

Pan Darius Kairos · January 7, 2020 at 5:33 pm

What's up with Montezuma's Revenge? lol

PixelPhobiac · January 7, 2020 at 5:33 pm

Yang 2020

ـ ـ · January 7, 2020 at 5:34 pm

Will it lose performance on game A if you trained it on A then train the existing network on B?

LalA Jun · January 7, 2020 at 5:35 pm

We want to see alphazero playing AOE II! I want to see how it will destroy pro players there

Oluchukwu Okafor · January 7, 2020 at 5:35 pm

This didn't feel like 5 minutes.

OoOoO · January 7, 2020 at 5:37 pm

A hangsúlyozásod furcsa, de a szókincsed nagyon gazdag :))

kowy · January 7, 2020 at 5:39 pm

bravo for what ? u havent said anything, showing numbers that says nothing

Solve Everything · January 7, 2020 at 5:43 pm

But when can I use this AI in my indie game????

Damian Reloaded · January 7, 2020 at 5:45 pm

Looking forward to an "Alpha MedBot" that can diagnose you at home or send you to the hospital if it can't.

Warsin · January 7, 2020 at 5:49 pm

Quite simple really

Ecci Ecci · January 7, 2020 at 5:50 pm

Can they go on in a direction to learn first person shooters like unreal tournament 2k4 please? Its much more dynamic, requires some sort of orientation in a true 3d world, object recognition, long term and short term planning and on top of that could keep old gems like ut2k4 alive for us old but good players. We need good opponents! Its the next step towards agents understanding the real world

kortizoll · January 7, 2020 at 5:56 pm

Who do you think is most likely to create true AGI first?

mastnejsalam · January 7, 2020 at 6:00 pm

I hope all that ice cream for my mind doesn't lead to a brain freeze.❤️ur vids

Henry AI Labs · January 7, 2020 at 6:02 pm

Great video! I am exploring MuZero this week as well in a series going from AlphaGo –> AlphaGo Zero –> AlphaZero –> MuZero! I hope attention around MuZero and these algorithms will also inspire more people to participate in Kaggle's Connect X 1st RL Competition!

After Arrival · January 7, 2020 at 6:06 pm

I can watch these videos even if they where 15 minutes 👌🏻

Nathaniel Luders · January 7, 2020 at 6:14 pm


Dragos Manailoiu · January 7, 2020 at 6:16 pm

Well I guess that superintelligence is coming faster than we thought

LegoEddy · January 7, 2020 at 6:16 pm

You might want to have a look at the paper "Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning" from Google Brain. They aim to have a (bird-) classifier that can detect objects (bird species) it has never seen before solely by a semantic description.

Armuotas · January 7, 2020 at 6:19 pm

It's not funny anymore when the AI outperforms humans in "Assault" by 276 times. Not funny at all! 🙂

Divy Bramhecha · January 7, 2020 at 6:20 pm

Thank you so much for these videos :3

Michael Hartman · January 7, 2020 at 6:28 pm

I haven't heard much about generalization levels of AI. Given we will not reach it fully overnight, is there a scale, or grading system for generalization? Did the learning time decrease, or ability decrease or increase?

Ree · January 7, 2020 at 6:30 pm

Please consider that the AI vs human statistics are kind of skewed, since the AI can react instantaneously, whereas humans have somewhere of about 200-300ms of reaction time.

Tallywort · January 7, 2020 at 6:35 pm

As interesting as the results are, I can't help but feel that they are impossible without the massive computing power that Google has. Making it harder to replicate the results, or do much with it.

Like You · January 7, 2020 at 6:35 pm

Not an expert here, but wouldn't it be better to train one neural network for one game seperately, set up a convolutional nn at the begining of each game that would be run once to detect/classify the game its playing by picture, and it's outputs would activate adequate nn responsible for playing that game?

Homo Sapiens · January 7, 2020 at 6:36 pm

I'm so glad this channel exists

Leo Staley · January 7, 2020 at 6:52 pm

Do you know of any other YouTube channels like this? I've been an educational YouTube junkie for a good 5 years and this is one of my favorite channels with frequent uploads. Most other channels that come close are channels that release longer form content much less frequently. Scishow is good, but not this quality. In astronomy news, Anton Petrov and Frasier Cain have frequent release schedules for their high quality content, but are usually just talking, with much less visual information on the topic than you provide. Rob Miles makes terrific AI safety content, but very infrequently.

Brent Lewis · January 7, 2020 at 6:53 pm

I think that this is a much bigger deal than a lot of the other stuff we've seen. I'd even go so far as to say that it warrants more than two minutes! Unfortunately, there are no videos from the paper demonstrating game play. I'm looking forward to demonstrations in real-world domains.

Black Nigga · January 7, 2020 at 7:03 pm

Oh God. It's already 3am and I'm still watching this.

Ix Suomi · January 7, 2020 at 7:27 pm

Team Liquid shirt on that guy, makes this less scientific, they get always beaten by non-top 10 teams.

DeepSpace12 · January 7, 2020 at 7:33 pm

A few papers down the line: AI trained on Chess defeated Go without extra training.

Pavor · January 7, 2020 at 7:38 pm

Best way to beat it it to unplug its power cord first.

Tanvir Hassan · January 7, 2020 at 8:01 pm

SKYNET is learning how to beat humans in game now, in near future it will learn how to Beat humans in Reality

What a time to be alive!

EctoMorpheus · January 7, 2020 at 8:10 pm

What happened to actually explaining how the algorithm works? Still, thanks for the video! I'll look it up myself

TimmacTR · January 7, 2020 at 8:31 pm

It would be interesting to find out what "kind" of games AI is inherently worse at and better at than humans. What are the criteria etc..

jossboloko · January 7, 2020 at 8:34 pm

Isn't that what AlphaZero were already achieving ? Surpassing human capabilities on Chess, Shogi and Go and atari games… I'm just saying this because I'm trying to see what's new about this MuZero algorithm compared to Alphazero

Amanuel Temesgen · January 7, 2020 at 8:38 pm

So happy and proud of Deepmind

Benjamin Anderson · January 7, 2020 at 8:58 pm

I just want to point out that when you make the distinction between narrow and general algorithms and say that you'd take the less advanced general algorithm over a more advanced algorithm for a specific game, you're talking about it from a research perspective. You're making the assumption that "2 papers down the line" there will be an algorithm that will be just as general and more skilled than the current algorithm. However, if there were a game which for some reason was impossible to play or get good at using a generalized algorithm (I don't know if this is even possible but just consider it), then it would be necessary to have a specific algorithm for that game if humans wanted to create something more skilled than themselves. Yes, that algorithm would not advance AI research at all, but it would be very useful for people who only care about that game.

camden parsons · January 7, 2020 at 9:05 pm

i would not be calling this general. it is not general because it can only play the game its trained to learn. its narrow AI that trains on a model that can be used to learn different games. A step forward nonetheless

steijn vanb · January 7, 2020 at 9:09 pm


Bram · January 7, 2020 at 10:09 pm

Personally, I'd like to see AI's playing Planetary Annihilation

GodOfReality · January 7, 2020 at 10:13 pm

I really wish my dream of seeing something like AlphaZero play an old MS-DOS game from 1995 called Descent would come true. That would be something absolutely fascinating to observe, how would optimal play look in a game like that, especially to see things like teams or some of the other alternate game modes.

nnn reddie · January 7, 2020 at 10:24 pm

elmo is already considered last gen

Kevin Bl. · January 7, 2020 at 10:35 pm

"requires a great deal of mechanical skill, split-second decision making (and imperfect information)" sounds like EXACTLY what AIs should be better at than humans

Altair * · January 7, 2020 at 10:37 pm

Approaching artificial general intelligence.

jnalanko · January 7, 2020 at 10:47 pm

Thanks for making the video, though I wish you went into more detail on how MuZero differs from AlphaZero and why it is able to generalize so well. How does it work? That would be the real ice cream for the soul. You're making me pull up the paper myself :).

GrantClark1999 · January 7, 2020 at 11:08 pm

I so badly want to see an actual AI play Rocket League…

Tijs Maas · January 7, 2020 at 11:10 pm

Thanks for the video, love Károly Zsolnai-Fehér's enthusiasm for generalisation in this new year!
In case you want more detail, we covered this paper our reading group:

EmmanuelMess · January 8, 2020 at 12:22 am

Hey! you didn't explain how it works! 🙁

DistortedV12 · January 8, 2020 at 12:56 am

If we could only figure out how to apply this thing to real world "games"…

SciFi Factory · January 8, 2020 at 1:26 am

I would love to see more about transfer learning. 🙂

Nisbah Mumtaz · January 8, 2020 at 4:21 am

So, RPG players are master race gamers confirmed?

AngelLestat2 · January 8, 2020 at 4:23 am

news: absolutely none.. "what a time to lose time".

c.j. · January 8, 2020 at 4:51 am

I would love to see this used in a rocket league bot.

Wild Animal Channel · January 8, 2020 at 5:37 am

It's good but can it do your homework? If I trick it and tell them my homework is a game.

depi zixuri · January 8, 2020 at 5:46 am

I can't wait for fps with AI enemies, which plan and do tactics, where the players has to learn skill to defeat the AI-NPC, instead of simply filling the sponge bullets.

Sourav Goswami · January 8, 2020 at 6:06 am

That's awesome @Two Minutes AI Papers

StupidityKiller - · January 8, 2020 at 6:11 am

Yeah but can it play Crysis ? .. wait

Nico Est. · January 8, 2020 at 7:48 am

Can this AI stop ww3?

Locut0s · January 8, 2020 at 8:22 am

I think generalization is actually THE key measure we should be focusing on and NOT skill level. After all a 5 year old probably can’t reliably learn to play chess or Starcraft all that well but can beat the pants off any AI in the world at general conceptual understanding of the world. We should be focusing on breadth and not depth.

Passe-Science · January 8, 2020 at 9:53 am

Performance and generalization yes but a very important aspect is also HOW it does things. So here are some additional details of what the paper is about:
Those alphago like agents used to have an external game simulator (to process the move they actually commit to play) and an internal simulator (to process the move they are browsing during the search to decide the best option). A game simulator does several things:
-you cannot play illegal move with it.
-it gives you the dynamic answer of rules to a move, like by example capturing the stones your move is actually capturing.
-it gives you a terminal status: "this is a win, loss, draw state".
The breakthrough of MuZero is that it has NO internal simulator of the game given, it should learn it's own representation of a game state and of game dynamic when "playing in it's head during search" (but still have an external simulator to process the moves it commit to play, so which is called only once by move of a game of a training session)
So basically MuZero could:
-read illegal moves as a possibility.
-badly process the dynamic of it's move (like it could forgot to capture the stones actually captured by it's move as long as this happens in it's head during search)
-miss that the game is ended (keep reading after a terminal state of the game).
So it's basically what every beginner in chess go through as the beginning of their learning process. Yet it still manages to learn a quite perfect and optimized model of the game, and use it to master the game itself.

koroko120 · January 8, 2020 at 10:26 am

Go-Explore is also an approach which scores good on Montezuma (and Pitfall). I don't know how it can generalize compared to the other approaches though

overfield18 · January 8, 2020 at 10:30 am

isnt out there an AI that can pass moctezumas revenge? i cant remember the name but i see on a video

Joko Mandiri · January 8, 2020 at 11:00 am

Imagine that, there will be an AI able to combine Nasnet, Deepmind, GPT-2 and BERT

And suddenly muricans, russkies and chinese government wettest dream come true

Din Ding · January 8, 2020 at 12:30 pm

civ6 or wif next

Everett01 · January 8, 2020 at 12:37 pm

…but can it get 5.51 on Dragster?

Sir Sephy · January 8, 2020 at 1:03 pm

can it play fortnite? I don't think so

Charlie Caper · January 8, 2020 at 1:09 pm

What a time…

fischX · January 8, 2020 at 1:26 pm

It also fails the Turing test

Ondřej Vitík · January 8, 2020 at 1:48 pm

AlphaZero isn't the best in the world at playing chess any longer, right?

Hydra'sLair · January 8, 2020 at 1:51 pm

2:43 Apparently MuZero cannot play Montezuma Revenge at all. I think we found the ultimate game boys…

Leave a Reply

Your email address will not be published. Required fields are marked *