So, John-Green-Bot, you know when you let me use your computer the other day?
Well, I went on YouTube and it was like seeing a completely different website.
There were videos about restoring old VCRs and different kinds of cassette tapes, and ads for motor oil?!
John-Green-bot: Yes, Jabril!
I love learning about other machines.
Jabril: Okay, but do you even know what humans are watching these days?
What about those Boston Dynamics videos?
The humans in those videos are so mean to the robots!
What about Epic Computation Battles of History, or MKB-AI, or Robot Appétit?
Jabril: …… what?
INTRO Hi, I’m Jabril and welcome to Crash Course AI!
Recommender systems are a type of AI that try to understand our brains and make useful recommendations to us.
This kind of AI can guide the things we watch by recommending YouTube videos or shows on Netflix for example.
On Amazon, it’s recommending items to buy, when I search on Google, it’s recommending relevant and interesting links.
And everywhere online, advertisement servers are trying to recommend products and services.
Recommender systems combine supervised learning and unsupervised learning techniques to learn about us.
And because we’re so complicated, recommending stuff to us is a tough problem that can produce lots of unexpected results.
Maybe we get caught in an online bubble and only see tweets from our friends and people who think like us.
Maybe we miss a new TV show because streaming sites don’t think we’d like it.
Or maybe that creepy thing happens where you’re talking to your friends about supercomputers and then every single ad you see for the next day is for supercomputers?!?
AI that make recommendations can really change what version of the internet we all see.
But to understand the benefits and drawbacks of these algorithms, we have to understand where they get their data and how they work.
As an example, let’s focus on an algorithm that could recommend YouTube videos.
Because “The Algorithm” is a really big deal if YouTube is your job, and everyone’s talking about the mysterious changes behind the algorithm anyway.
Three common approaches are content-based recommendation, social recommendation, and personalized recommendations.
Content-based recommendations look at the content of the videos, not the audience.
Like, for example, our algorithm may decide to recommend more recent videos, or videos that are made by someone on a list of “quality creators.” But this means someone has to decide who “quality creators” are, or program an AI that tries to predict creator quality.
On the other hand, social recommendations pay attention to the audience.
YouTube is on the internet so we can use social ratings such as “likes” or “views” or “watch time” to decide what people are watching and should be recommended.
But not everybody likes the same stuff, so maybe pure popularity isn’t the way to go.
Different people have different preferences, so our AI can incorporate that with personalized recommendations.
If you like this Crash Course video, maybe we’d recommend other Crash Course videos or videos from my channel.
But the problem with personalized recommendations is that it might be difficult to stumble onto new interesting stuff.
So, to get the best of all worlds, recommender systems generally use collaborative filtering, which combines all three of these recommenders.
When we see a recommendation on YouTube, it could be because that video is similar to other videos that we’ve watched and liked and other people who have similar tastes watched and liked that video.
Or (especially if you’re new to Youtube) that video might be recommended because it’s popular and lots of people are watching and liking it.
Collaborative filtering combines several of the techniques we’ve already talked about in Crash Course AI.
It uses unsupervised learning to find similar people or content, and it tries to use data from those things to predict how we would feel about something we haven’t even seen yet.
To see how collaborative filtering works, let’s use a simple example.
In this table, YouTube channels are represented as columns.
So, here, one column represents CrashCourse, one is Jabrils, one is The Best of BattleBots, one is The Art Assignment, and so on.
Specific users that watch YouTube videos are represented as rows.
So this row is John-Green-bot, this one is me, these two are a couple random folks, this one is our producer Brandon, and so on.
Each cell in the table corresponds to whether the user subscribes to a specific channel or not.
1 means they’ve watched at least one video and subscribed, 0 means they’ve watched at least one video and didn’t subscribe, and the cell is empty if they haven’t seen any videos.
If we look at John-Green-bot’s row, he subscribes to Crash Course and Jabrils, so those cells have a 1.
He saw The Best of Battlebots and did not subscribe, because of all the robot-on-robot violence, so that’s a 0.
And he’s never seen The Art Assignment so there’s no information in that cell.
To recommend new channels for John-Green-bot, our collaborative filtering AI needs to predict how likely he is to subscribe to a channel he’s never seen before.
In this case, let’s see if The Art Assignment ends up in his recommendations.
To make a prediction, the algorithm needs to look at which other people have subscribed to the Art Assignment.
And because YouTube tastes are very personal, instead of looking at all other users, our algorithm will focus on finding the users who are most similar to John-Green-Bot.
Finding similar things is a classic unsupervised learning problem.
Our AI can look at all the rows, cluster together similar users, and then pick some of those that are most similar to John-Green-Bot, and who have seen The Art Assignment.
Let’s just say there are 1000 of these specific users, but there are other clusters with thousands of users too that these recommender systems take into consideration.
Now, we have a classic supervised learning problem: training an AI to make predictions based on past examples.
In this case, we’re training an AI to predict a 1 or 0 (subscribe or not) for John-Green-bot based on other users.
We can re-adjust the results so that ratings from the cluster of 1000 most similar users are given more weight in the final prediction, compared to those other clusters.
And after the predictions are sorted, our AI does predict that John-Green-bot would subscribe to The Art Assignment, so it gets recommended to him… along with some other new channels.
Recommender systems that use collaborative filtering AI can take in lots of different data, not just a 1 or a 0, for whether a user subscribed to a YouTube channel or bought a product.
A movie rating site might use a one-to-five star rating system.
Or a social media AI could keep track of the number of milliseconds a user dwells on a post.
Regardless, the basic strategy is the same: use known information from users to predict preferences.
And this can get complicated on big websites that gather lots of user information using a combination of different algorithms.
The real world is full of a lot of data and there are three key problems that can lead to recommender systems making small or big mistakes.
First, datasets that recommender system AIs get are usually very sparse.
Most people don’t watch most shows or videos -- there just isn’t enough time!
And even fewer people give social ratings such as “likes.” Doing any kind of analysis with sparse datasets is very computationally intense, which gets expensive, which means some companies are willing to lose some accuracy to reduce costs.
Second, there’s the cold start problem.
When we go on a website for the first time, for example, the AI doesn’t know enough about us to provide good personalized recommendations.
And third, even if an AI makes statistically likely predictions, that doesn’t mean those recommendations are actually useful to us.
Online ads run into this failure a lot, where we’ll be shown ads for sites we recently visited, or something we just bought.
Sure, that’s probably something I’m interested in, but I could’ve figured that out without a recommender system.
In a potentially more harmful way, recommender systems don’t understand important social context, so “statistically likely” recommendations can be worrying.
Recommendations may stereotype users in a socially uncomfortable way.
Like, for example, an AI might assume that because John-Green-Bot is a robot, he really wants to watch WALL-E and Robocop.
Just because he’s a robot doesn’t mean he wants to watch robot stuff.
Or recommendations might be inappropriate for certain users, like recommending a video that a parent would consider too violent to their children after they had watched a bunch of NERF War videos.
And, on social media, recommendations can trap us in ideological echo chambers, where we tend to only see the opinions from people that agree with us, which can skew our knowledge about the world.
This idea that we all see slightly different versions of the internet, and data is constantly being collected about us, can be a little concerning.
But understanding how recommender systems work, can help us live more knowledgeable lives, and coexist with AI.
When we don’t want data added to a recommender system’s model of us, we can use a private or incognito browser window and not log into sites.
If we open a news homepage this way, we might see what the average human (or robot!)
is being recommended.
Of course, incognito browsers don’t mean total privacy, but this strategy prevents sites from connecting data -- like, for example, my Twitter account with my searches for tiny polo shirts on Google (because I needed to get John-Green-bot a birthday present).
Plus, since we spend so much time online, we might want to make the most of it with really personalized recommendations.
So… seriously… “like, comment, and subscribe” to your favorite creators because as we leave ratings, reviews, and other traces of online activities, recommender systems can learn better models.
Recommender systems are a part of the internet as we know it, whether we like it or not.
And as AI becomes a bigger part of our lives,these kinds of recommendations will be too.
So it’s on us to be aware of this technology, so that we know what kind of world we’re living in, and the ways AI might influence us every single day.
And if you’re here to learn how to build recommender systems, my advice would be to think explicitly about the trade-offs that are involved.
Deciding how to define the clusters of users or items, can create more or less personalized spaces.
In our next episode, we’ll work together on some code to build a recommender system, and we’ll get some hands-on experience with weighing some of these trade-offs.