I was playing around with Google Trends last week, trying to see how a few MMOs compared to each on the ‘Trends. It was fun and all, but I missed a lot of games and I just thought it would be so much better if I could just compare EVERY MMO, past and present, automatically!
Google Trends just catalogs the searches people would do for a game, but I figured that had to have at least a loose correlation to popularity, right?
Well, it was an… interesting journey, to be sure.
I program in Java, JavaScript and TypeScript at work, but my language of choice for my own projects is Python.
Looking around, I found a lot of support for accessing Google Trends via Python — someone has even written a package, pytrends, which would do the job.
How Google Trends works
The code library works the same way the website itself does. Choose one to five search terms, optionally a category, optionally a time period. It returns a graph of trending searches over time, per search term. The maximum value of a data point is 100; the minimum is zero.
How sorting works
The problem here is that there isn’t one specific value you can grab that says, this value is how popular a search is, irrespective of what it’s compared to. All the values are relative to the other terms in the search. World of Warcraft, the most popular MMO on Google Trends, when compared to, say, Allods Online, will be hovering near 100 every day, and Allods will be zero. Compare Allods Online vs Albion Online, though, and things will be a lot more interesting, as they are in the same general range.
My bright idea was to write a function, in Python, that would take two game names and run them through Trends, average out the daily values for each game, and compare those two values. I could pipe this into Python’s sort function, and, easy-peasy, all the games would sort nicely, and I could get a complete ranking of the 260+ games I’d culled from Steam. (I did a search for MMORPGs and Massively Multiplayer on Steam, then copy and pasted those results into a document, and used regular expressions to strip away anything that wasn’t a game name.)
Welcome to Google Jail
There’s a few problems with that, I’d learn. For one, Google doesn’t like people hitting their servers with thousands of requests in a few seconds. They call that an attack. Which wounded me. I just loved their service too much. First they throttled me — if I did just one query a minute, I could stay. But that would take too long! So after a few queries, I would just go as fast as I could for awhile.
Then, they shut me off completely. Any query I sent would be immediately returned with a Too Many Requests error. I was in Google Jail. I would get nothing more from them until I’d served my sentence.
A full day later — this afternoon — I tried again and I was very sorry and I spaced every query by two seconds and they let me through. I’d also trimmed that mega list down to only MMOs I recognized or at least cared about.
Garbage In, Garbage Out
It worked great! Well, it looked like it worked great.
Here’s the top ten results, according to my wonderful program. RuneScape… at the top? And MapleStory above World of Warcraft?
It doesn’t make any sense. And it’s just wrong.
I went through Google Jail for garbage.
The problem, I think, is this.
We learned in elementary school that if a > b and b > c, then a > c. Basic stuff. Alice has 10 apples. Betty has 5 apples. Alice has more apples than Betty. Carla has just one apple. Betty has more apples than Carla. Since Alice has more apples than Betty, and Betty has more apples than Carla, then Alice has more apples than Carla. It also follows c < a — Carla has fewer apples than Alice.
Python’s sort algorithm is based on these relationships. If we could get a definite number of searches — or apples — from Google Trends, then Python’s sort would work wonderfully.
Instead, we get the situation where Alice has a hundred billion apples, Betty has 20, and Carla has 3. Since the max value is 100, we can just divide all those values by a billion and now Alice is at 100, and Betty and Carla are at zero. With just these relative numbers, suddenly we cannot usefully compare Betty and Carla’s apples at all.
These put little time bombs in the sort algorithm. When I ran the program, neither RuneScape nor MapleStory were ever actually compared directly with World of Warcraft. If they were, they’d never have been ranked higher than WoW. Instead, their ordering was inferred from other comparisons, which were poisoned by comparisons that broke down because of the vast difference of interest between the top games and the bottom ones.
The only sure way to do the comparison is to compare every game against every other.
That wouldn’t that bad. I have approximately 26 games in my short list. Comparing them all against each other would be 212 = 4,192 individual calls to Google Trends. They only allow 1,400 calls per four hours, so I could run this in ten hours or so. And then we’d know.
But, doesn’t Google Trends allow five terms at once?
It sure does. Let’s talk about that.
I read an article that suggested using a search term that would rank higher, but not substantially so, and make that the first term in every query. Since it would always rank the same (being relatively higher than every other possible term), we would be able to extract an actual unchanging rank value for every other game in that query, and by extension, for every game in the list. With 63 games, it would take 16 such queries to sort the entire list, and the ordering would be… no, it wouldn’t be correct.
It’s a data thing.
See, World of Warcraft is the obvious choice. We know it’s #1. But it’s the old hundred billion apples problem, again. It’s just so much more popular than the lowest ranked items that it would be WoW at top, a few competitors ranked more or less correctly, and then all the lower ranked games jumbled together as they are all essentially tanked at zero, compared to WoW.
Lemme think about it.
Of course, we could iterate. Run through once and find the top few games with enough showing in Trends to be reasonable. Add those to the top of the ranking, then remove them from the list and run it again, this time with the lowest of those removed as the trend setter for the others. And so on, until we were down at the level of the games with the least Trends. That would probably take only two or three runs, and we’d get an idea of the general tiers of popularity for the MMOs.
I’ll probably try that next time.
Now that I’m thinking about it, I just might be able to quicksort my way out of this. Dammit. I was even thinking of using qsort but then made the conscious decision to use the built-in sort.
Quick sort only needs to know if a set of values is higher or lower than a randomly chosen “pivot” value. Then these higher and lower sets are themselves quicksorted. All comparisons are eventually made.
I am not sure why you think RuneScape and MapleStory, two very popular, free to play since launch, MMORPGs that both pre-date WoW, shouldn’t top the list. That seems to fit.
The list, to my mind, doesn’t get odd until you get to Neverwinter. That MMORPG clearly isn’t that popular. But it probably gets mixed in with other titles, like Neverwinter Nights 1 & 2, that AOL title from the 90s, plus all the Forgotten Realms references.
Rift also seems unlikely but, like Neverwinter, that name also brings up a lot of other things, like the Oculus VR headset.
Anyway, that is just an observation on your first result set. I will be interested to see how it looks when you refine things.
I have no way of knowing how popular they really are. But when I put the top three + FFXIV into Google Trends directly, WoW is at top, and FFXIV — here WAAAAY down the list — is #2. So my program isn’t doing what I want it to do, which is to consolidate Trends information across more than five data points.
I think qsort will work here…
Runescape is WAY more popular than hardcore mmorpg players ever realize. It was been the default social game for 8-12 year olds for years before Minecraft arrived. It wouldn’t surprise me in the slightest if more people have played it than ever played WoW (I’ve seen estimates of lifetime numbers exceeding 100m) so it also wouldn’t be a surprise if it had been searched for more often.
I guess?
Just goes to show how I can get into a bubble and forget about all those other MMOs, just because I missed them when they were new.