Making sense of my YouTube Watch History
For a long time, I have heard the phrase `You are what you eat`. A variation of that which I sparingly have heard is 'what you read'. And now even though I haven't heard it, I have at least pondered 'Am I, what I see and read online?' The question mark is present because I wonder about freedom of choice, or at least the illusion of one.
Brushing those heavy topics aside, I wanted to at least be conscious of how I spend my time online. I don't use popular social media sites. I do use content aggregator ones likes Reddit and YouTube. Reddit is easy to quantify and make sense because of how it is structured. Also, Reddit does its own version of Spotify wrapped each year. That leaves me with just YouTube.
The Hard Part - Getting the Data
YouTube does have a stat's page which lets you know how much time you spend for that particular week. You can also check your watch history, but it is not in machine-readable form. Even wanting to know which channels I watch more often than not, there isn't a straight forward way.
After some googling, I decided to use Google Takeout. This is a way to download all the data Google has on you. Here, I chose to download just my YouTube watch history, got a link and voilà, I had my entire watch history. Neat! Now all I had to do was to fire up my IDE and analyze the data! Or that's what I thought until I opened the file.
"header": "YouTube",
"title": "Watched https://www.youtube.com/watch?v=0BzGlfm1wFo",
"titleUrl": "https://www.youtube.com/watch?v=0BzGlfm1wFo",
"time": "2022-04-10T18:13:36.349Z",
"products": ["YouTube"],
"activityControls": ["YouTube watch history"]
The above is an example of what data was available. Now, I have the URL for the video watched, but I don't have any data pertaining to the channel or any of the video tags.
Googling more, I found YouTube API exists. While these APIs were useless in getting my data, they had the means to get publically available data such as what channel a video belonged to and tags. After a couple of hours of setting everything up, I was able to get channel details of my last three months of watch history. Just three months, as I didn't want to use up my free API calls.
The Easy Part - Inference
Now that data is available, I was able to write small scripts to parse them. I tried inferring what type of videos I spent watching by counting the unique tags. While there were what I consider a lot of unwanted noise, I was still able to infer the data at hand to make meaningful conclusions without much fuss.
Tag | Count |
---|---|
elden ring | 205 |
Gameplay | 92 |
Classics | 63 |
Literature | 63 |
T20 | 62 |
Literary Analysis | 63 |
Myths | 62 |
Cricbuzz | 61 |
linus | 56 |
rust programming language | 41 |
The above is not reflective of my entire data, as I cleaned up the duplicates and noise. But this overall gives a better idea of how I spent my time on YouTube over the past three months. This still not indicative of the complete picture because a video will have multiple tags and creators tend to spam tags for SEO purposes.
Next step was to find out which channels I spend a lot of time watching videos of. The list below has my top 24 channels.
Channel Name | Count |
---|---|
PlayStation Access | 100 |
gameranx | 86 |
Linus Tech Tips | 78 |
Overly Sarcastic Productions | 70 |
Cricbuzz | 62 |
IGN | 58 |
LMG Clips | 54 |
WatchMojo.com | 50 |
Let's Get Rusty | 39 |
Jarrod Kimber | 38 |
The Grade Cricketer | 36 |
Fextralife | 35 |
Ethan Chlebowski | 35 |
Saturday Night Live | 32 |
The Graham Norton Show | 32 |
Ashwin | 31 |
The Late Show with Stephen Colbert | 29 |
Dan Murrell | 25 |
Jeremy Jahns | 24 |
videogamedunkey | 24 |
Marques Brownlee | 19 |
Chris Stuckmann | 18 |
MrMobile [Michael Fisher] | 16 |
Eurogamer | 15 |
This is reflective of how I spent the three months in question. Broadly, these channels fall into one of the categories.
- Gaming
- Technology
- Entertainment & Stories (Include movies)
- Cricket
- Cooking (There is a lone cooking channel)
While I didn't have to go through the entire exercise of getting hard data and parsing it, it was satisfying to back up anecdotal understanding of self with hard data. Or this may have been an exercise in vanity.
Foot Note of sorts
My original intention was to highlight a few channels which I think are great and enjoy immensely. And I wanted to back it up with data. While I can pick a few from this list, the reality is that some smaller ones can't keep pace with the bigger ones. I will probably do a separate post on it.