Jabril: John Green Bot are you serious?!
I made this game and you beat my high score?
Jabril: So John Green Bot is pretty good at Pizza Jump, but what about this new game we made, TrashBlaster?
John-Green-bot: Hey, that’s me!
Jabril:Yeah, let's see watch you've got.
John-Green-bot: That’s not fair, Jabril!!
Jabril: It's okay John Green Bot we've got you covered.
Today we’re gonna design and build an AI program to help you play this game like a pro.
INTRO Hey, I’m Jabril and welcome to Crash Course AI!
Last time, we talked about some of the ways that AI systems learn to play games.
I’ve been playing video games for as long as I can remember.
They’re fun, challenging, and tell interesting stories where the player gets to jump on goombas or build cities or cross the road or flap a bird.
But games are also a great way to test AI techniques because they usually involve simpler worlds than the one we live in.
Plus, games involve things that humans are often pretty good at like strategy, planning, coordination, deception, reflexes, and intuition.
Recently, AIs have become good at some tough games, like Go or Starcraft II.
So our goal today is to build an AI to play a video game that our writing team and friends at Thought Cafe designed called TrashBlaster!
The player’s goal in TrashBlaster is to swim through the ocean as a little virtual John-Green-bot, and destroy pieces of trash.
But we have to be careful, because if John-Green-bot touches a piece of trash, then he loses and the game restarts.
Like in previous labs, we’ll be writing all of our code using a language called Python in a tool called Google Colaboratory.
And as you watch this video, you can follow along with the code in your browser from the link we put in the description.
In these Colaboratory files, there’s some regular text explaining what we’re trying to do, and pieces of code that you can run by pushing the play button.
These pieces of code build on each other, so keep in mind that we have to run them in order from top to bottom, otherwise we might get an error.
To actually run the code and experiment with changing it, you’ll have to either click “open in playground” at the top of the page or open the File menu and click “Save a Copy to Drive”.
And just an fyi: you’ll need a Google account for this.
So to create this game-playing AI system, first, we need to build the game and set up everything like the rules and graphics.
Second, we’ll need to think about how to create a TrashBlaster AI model that can play the game and learn to get better.
And third, we’ll need to train the model and evaluate how well it works.
Without a game, we can’t do anything.
So we’ve got to start by generating all the pieces of one.
To start, we’re going to need to fill up our toolbox by importing some helpful libraries, such as PyGame.
The first step in 1.1 and 1.2 loads the libraries, and step 1.3 saves the game so we can watch it later.
This might take a second to download.
The basic building blocks of any game are different objects that interact with each other.
There’s usually something or someone the player controls and enemies that you battle -- All these objects and their interactions with one another need to be defined in the code.
So to make TrashBlaster, we need to define three objects and what they do: a blaster, a hero, and trash to destroy.
The blaster is what actually destroys the trash, so we’re going to load an image that looks like a laser-ball and set some properties.
How far does it go, what direction does it fly, and what happens to the blast when it hits a piece of trash?
Our hero is John-Green-bot, so now we’ve got to load his image, and define properties like how fast he can swim and how a blast appears when he uses his blaster.
And we need to load an image for the trash pieces, and then code how they move and what happens if they get hit by a blast, like, for example, total destruction or splitting into 2 smaller pieces.
Finally, all these objects are floating in the ocean, so we need a piece of code to generate the background.
The shape of this game’s ocean is toroidal, which means it wraps around, and if any object flies off the screen to the right, then it will immediately appear on the far left side.
Every game needs some way to track how the player’s doing, so we’ll show the score too.
Now that we have all the pieces in place, we can actually build the game and decide how everything interacts.
The key to how everything fits together is the run function.
It’s a loop of checking whether the game is over; moving all the objects; updating the game; checking whether our hero is okay; and making new trash.
As long as our hero hasn’t bumped into any trash, the game continues.
That’s pretty much it for the game mechanics.
We’ve created a hero, a blaster, trash, and a scoreboard, and code that controls their interactions.
Step 2 is modeling the AI’s brain so John-Green-bot can play!
And for that, we can turn back to our old friend the neural network.
When I play games, I try to watch for the biggest threat because I don’t want to lose.
So let’s program John-Green-bot to use a similar strategy.
For his neural network’s input layer, let’s consider the 5 pieces of trash that are closest to his avatar.
(And remember, the closest trash might actually be on the other side of the screen!)
Really, we want John-Green-bot to pay attention to where the trash is and where it’s going.
So we want the X and Y positions relative to the hero, the X and Y velocities relative to the hero, and the size of each piece of trash.
That’s 5 inputs for 5 pieces of trash, so our input layer is going to have 25 nodes.
For the hidden layers, let’s start small and create 2 layers with 15 nodes each.
This is just a guess, so we can change it later if we want.
Because the output of this neural network is gameplay, we want the output nodes to be connected to the movement of the hero and shooting blasts.
So there will be 5 nodes total: an X and Y for movement, an X and Y direction for aiming the blaster, and whether or not to fire the blaster.
To start, the weights of the neural network are initialized to 0, so the first time John-Green-bot plays he basically sits there and does nothing.
To train his brain with regular supervised learning, we’d normally say what the best action is at each timestep.
But because losing TrashBlaster depends on lots of collective actions and mistakes, not just one key moment, supervised learning might not be the right approach for us.
Instead, we’ll use reinforcement learning strategies to train John-Green-bot based on all the moves he makes from the beginning to the end of a game, and we’ll evolve a better AI using a genetic algorithm which is commonly referred to as GA. To start, we’ll create some number of John-Green-bots with empty brains (let’s say 200), and we’ll have them play TrashBlaster.
They’re all pretty terrible, but because of luck, some will probably be a little bit less terrible.
In biological evolution, parents pass on most of their characteristics to their offspring when they reproduce.
But the new generation may have some small differences, or mutations.
To replicate this, we’ll use code to take the 100 highest-scoring John-Green-bots and clone each of them as our reproduction step.
Then, we’ll slightly and randomly change the weights in those 100 cloned neural networks, which is our mutation step.
Right now, we’ll program a 5% chance that any given weight will be mutated, and randomly choose how much that weight mutates (so it could be barely any change or a huge one).
And you could experiment with this if you like.
Mutation affects how much the AI changes overall, so it’s a little bit like the learning rate that we talked about in previous episodes.
We have to try and balance steadily improving each generation with making big changes that might be really helpful (or harmful).
After we’ve created these 100 mutant John-Green-bots, we’ll combine them with the 100 unmutated original models (just in case the mutations were harmful) and have them all play the game.
Then we evaluate, clone, and mutate them over and over again.
Over time, the genetic algorithm usually makes AI that are gradually better at whatever they’re being asked to do, like play TrashBlaster.
This is because models with better mutations will be more likely to score high and reproduce in the future.
ALL of this stuff, from building John-Green-bot’s neural network to defining mutation for our genetic algorithm, are in this section of code.
After setting up all that, we have to write code to carefully define what doing “better” at the game means.
Destroying a bunch of trash?
Staying alive for a long time?
Avoiding off-target blaster shots?
Together, these decisions about what “better” means define an AI model’s fitness.
Programming this function is pretty much the most important part of this lab, because how we define fitness will affect how John-Green-bot’s AI will evolve.
If we don’t carefully balance our fitness function, his AI could end up doing some pretty weird things.
For example, we could just define fitness as how long the player stays alive, but then John-Green-bot’s AI might play \TrashAvoider\ and dodge trash instead of TrashBlaster and destroy trash.
But if we define the fitness to only be related to how many trash pieces are destroyed, we might get a wild hero that’s constantly blasting.
So, for now, I’m going to try a fitness function that keeps the player alive and blasts trash.
We’ll define the fitness as +1 for every second that John-Green-bot stays alive, and +10 for every piece of trash that is zapped.
But it’s not as fun if the AI just blasts everywhere, so let’s also add a penalty of -2 for every blast he fires.
The fitness for each John-Green-bot AI will be updated continuously as he plays the game, and it’ll be shown on the scoreboard we created earlier.
You can take some time to play around with this fitness function and watch how John-Green-bot’s AI can learn and evolve differently.
Finally, we can move onto Step 3 and actually train John-Green-bot’s AI to blast some trash!
So first, we need to start up our game.
And to kick off the genetic algorithm, we have to define how many randomly-wired John-Green-bot models we want in our starting population.
Let’s stick with 200 for now.
If we waited for each John-Green-bot model to start, play, and lose the game… this training process could take DAYS.
But because our computer can multitask, we can use a multiprocessing package to make all 200 AI models play separate games at the same time, which will be MUCH faster.
And this is all part of the training.
This is where we’ll code in the details of the genetic algorithm, like sorting John-Green-bots by their fitness and choosing which ones will reproduce.
Now that we have the 100 John-Green-bots that we want to reproduce, this code will clone and mutate them so we have that combined group of 100 old and 100 mutant AI models.
Then, we can run 200 more games for these 200 John-Green-bots.
It just takes a few seconds to go through them all thanks to that last chunk of code.
And we can see how well they do!
The average score of the AI models that we picked to reproduce is almost twice as high as the overall average.
Which is good!
It means that the John-Green-bot is learning something.
We can even watch a replay of the best AI.
Uh… even the best isn’t very exciting right now.
We can see the fitness function changing as time passes, but the hero’s just sitting there not getting hit and shooting forward - we want John-Green-bot to actually play, not just sit still and get lucky.
We can also see visual representation of this specific neural network, where higher weights are represented by the redness of the connections.
It’s tough to interpret what exactly this diagram means, but we can keep it in mind as we keep training John-Green-bot.
Genetic algorithms take time to evolve a good model.
So let’s change the number of iterations in the loop in STEP 3.3, and run the training step 10 times to repeatedly copy, mutate, and test the fitness of these AI models.
Okay, now I’ve trained 10 more iterations.
And if I view a replay of the last game, we can see that John-Green-bot is doing a little better.
He’s moving around a little and actually sort of aiming.
If we keep training, one model might get lucky, destroy a bunch of trash, has a high fitness, and gets copied and mutated to make future generations even better.
But John-Green-bot needs lots of iterations to get really good at TrashBlaster.
You might consider changing the number of iterations to 50 or 100 times per click… which might take a while.
Now here’s an example of a game after 15,600 training iterations just look at John-Green-bot swimming and blasting trash like a pro.
And all this was done using a genetic algorithm, raw luck, and a carefully crafted fitness function.
Genetic algorithms tend to work pretty well on small problems like getting good at TrashBlaster.
When the problems get bigger, the random mutations of genetic algorithms are sometimes… well, too random to create consistently good results.
So part of the reason this works so well is because John-Green-bot’s neural network is pretty tiny compared to many AIs created for industrial-sized problems.
But still, it’s fun to experiment with AI and games like TrashBlaster.
For example, you can try to change the values of the fitness function and see how John-Green-bot’s AI evolves differently.
Or you could change how the neural network gets mutated, like by messing with the structure instead of the weights.
Or you could change how much the run function loops per second, from 5 times a second to 10 or 20, and give John-Green-bot superhuman reflexes.
You can download the clip of your AI playing TrashBlaster by looking for game_animation.gif in the file browser on the left-hand side of the Colaboratory file.
You can also download source code from Github to run on your own computer if you want to experiment (we’ll leave a link in the description).
And next time, we’ll start shifting away from games and learn about other ways that