May 302011

Why do so many reviewers use only the top half of their rating scale? Has the Metacritic age led to score inflation? And why do some reviewers refuse to give out perfect scores? Answers to all of these questions, plus why Metacritic no longer lists scores from X-Play.

“Score Inflation”

It’s no secret that most review magazines and websites today only use the top half of their rating scale when issuing a review score. Are reviewers being paid off by videogame publishers? Do magazines and websites give 7/10s to mediocre games out of a fear of losing advertising dollars? Other than few isolated instances, the reality of the situation isn’t nearly so scandalous.

For those who are unaware of Metacritic, it’s a review aggregator that collects review scores from various sources, and displays an overall average score for the game, movie, or album. However, in order to make things easier to average, Metacritic converts every review score into a scale out of 100. So 9/10 would become 90/100, 4/5 would become 80/100, etc.

On the surface this seems simple enough, but things become trickier when you figure in letter grades. Traditionally, 90-100 = A, 80-89 = B, etc. The problem is, one person might give a mediocre game a 50/100, while another will give it a C, which would be a 75/100. Metacritic tries to compensate for this by changing the values that letter grades are given so that a C = 50/100, but in doing so they’re overlooking a larger problem: any time a scoring system is based on a scale of 1-100, there is a natural tendency for many people to view the score as an equivalent of a letter grade. Hence why so many mediocre games are given a 75/100.

What makes videogames different than movies or music is that the videogame industry actually pays attention to Metacritic scores. Indeed, there are even reports of publishers denying developers a bonus if the game receives too low of a Metacritic score. The website has become so embedded in the culture of the industry that it has resulted in reviewers adjusting their own scores to reflect how they want it to show up out of 100.

A “Perfect” Score

I’ve got a funny story for you. In college, I had this teacher who was like a female version of John Malkovich’s character from Art School Confidential. One day she had us participate in an exercise where we were to mix white and black paint to create a sequence of ten rectangles illustrating a perfect scale of shades of gray between white and black. The middle rectangle needed to be a perfect mix of the two, a “true” gray, and each of the other rectangles needed to be as close to a 10% difference in shade as humans are capable of.

We’d each show her what we had, and she’d say “ah, but see, this gray is closer to the one before it than to the one after it,” and things of that sort. We worked on this for two hours, constantly revising to try and achieve perfection, until she revealed to us a little factoid: never in all her years of teaching, has she ever seen a gradient scale she thought was perfect. In fact, even the ones printed in textbooks she never entirely “agreed with.”

Thing is, she wasn’t saying this as some sort of lesson. In fact, she still expected us to spend the next few hours trying, and was still going to grade us on the results! She was merely telling us this anecdotally, as if we would find it fascinating, when in reality we were all looking at each other like “we are so screwed…”

There’s a belief among some gamers and reviewers that there’s no such thing as a perfect game, and thus a perfect score should be given only on rare occasion, if at all. But if a teacher revealed to you that they’d only given 100/100 on six papers over the course of a decade, because there’s no such thing as a “perfect” paper, you’d probably think they were senile or something. You never see a music critic refusing to give any album five stars because there’s no such thing as the perfect album, and Siskel & Ebert never refused to give a movie two thumbs up just because there’s no such thing as a perfect movie. So why should games be any different?

Metacritic Vs. X-Play

Imagine you’ve just been given 50/100—a failing grade—on a paper you’ve written. When you go to the teacher to ask for more clarification on what you did wrong, they tell you that the paper wasn’t actually terrible; it was merely average, and the teacher doesn’t believe in using only the upper half of the grading scale. If that has a negative impact on your grades, well, there’s always going to be some casualties when trying to enact change.

That teacher wouldn’t be X-Play. G4′s X-Play rates games on a five-star scale, which would look kind of silly to only use the upper half of. When a developer complained to Adam Sessler about how a grade of three stars or less from X-Play significantly drags down a game’s average (due in part to scores from reviewers of higher stature being given more weight), Sessler actually contacted Metacritic to see if they could work something out.

Unfortunately, Metacritic has no interest in adapting game scores (rather than converting literally) just because their own 100 point scale can easily be interpreted like an academic grading scale. And X-Play doesn’t want to switch from their long established five star scale just because of the way some website recently started affecting videogame culture. Ultimately, they came to an agreement to simply have X-Play scores no longer listed on the site.

In Conclusion

It might seem like I’m defending review scores that only use the upper half of the scale. In a perfect world, everyone would go back to using a 1-10 or five star scale, rather than a 5-10 scale. Unfortunately, I don’t see that ever happening without Metacritic either falling out of popularity, or just changing their 100 point scale to a 10 point scale (to distance itself from the similarity to a academic grading system).

In the meantime, I don’t see any point in punishing developers out of pure stubbornness, the way a few major review sites continue to do. Taking that approach isn’t going to change anything. On the other hand, perhaps if enough major sites were to ask to be delisted from Metacritic the way X-Play has, they’d start to get the message…

Critical Kate

Critical Kate was born on the same day as the Famicom. Primarily a console gamer, Kate has been playing videogames regularly since the NES days (excluding a hiatus during most of the "128-bit" era in college). She prefers her games arcade-y, enjoys a good story, and doesn't understand the appeal of recreational frustration. You don't want to encounter her in the Facility with Remote Mines.

  43 Responses to "Understanding Review Scores In The Metacritic Age"

  1. With at least 5 separate, public cases of developers/publishers calling editors and ‘expressing their belief that game X deserves a higher rating’, after which the editor fires the reviewer and has some unknown ‘reviewer’ give it a +2 (/10) bump, I have to think that you must be new to all this. Of course the larger publishers and game companies are putting pressure on reviewers, since they actually do tend to connect with quite a few people and influence their decisions. They cannot directly blackmail, but the call itself is the threat. No serious reviewing group ever listens to a company like this. The best they can do is to promise that they will fix the product and then ask for a re-review. That is not how the game industry works though, so the calls are threats to pull advertising. I would love to hear what the wrongful dismissal settlements go for, but of course they are all confidential.

    If the review is coherent, lists good and bad aspects, and compares to other games in the genre, then many readers will take it seriously. XBOX and PS3 magazines are widely derided for giving great scores to crappy games to drive up sales (which they get a part of). Some like PC Gamer
    can still be useful, since a static -2 adjustment to their score (on a big-ticket game) will bring them in-line with reality. PC Gamer has been through the above pressure at least twice which probably resulted in their adjustment for major release from major companies. It is sad, since they used to be worth reading…

    With Metacritic, they have wandered towards a more statistically relevant system from their early days. I remember discovering about 4 years ago that a 14 year old in Australia had posted his self-proclaimed, unprofessional review. The old Metacritic decided to use his 10/10 in their aggregate (probably since he was the only one on the planet that thought the particular game was perfect). Even though they no longer grab random forum posts as ‘professional reviews’, if you look at their aggregate scores and compare them to the average player’s scores (from various sites), you will find that they are still quite inaccurate. The best way to judge is to grab 10 facepalm games like Homefront, and see what metacritic gave them. If there are not less than three that are scored 7 or higher, I would be surprised. Metacritic is still working on becoming relevant, but is not quite there yet. Any reviewer that does not give out less than 5/10 is actually only scoring on a X-Play-scale anyway, and should be considered as such (round down so 7/10 = 3/5). The greatest joke about that is, of course, that they are suggesting that all their other sources that they do use are scoring out of 100 which they are not. If they cannot multiple X-Play’s score by 20 then they should not be allowed to multiply a 10 point scale by 10 either.

    As to the industry relying on Metacritic, the TV industry bases all their decisions on the Neilson rating system. The Neilson system has been proven to be inaccurate by more than +/- 25% (and in some cases much more – especially with the advent of online access). Lazy people want simple numbers with no variables. Bean counters are generally lazy, more concerned with how much they can grow their bonuses by saving 5 cents on paperclips, than by doing a proper analysis for themselves (data mining the forums). It is not a surprise that the gaming industry is becoming more business than entertainment. Just ask where the Expansion packs went, and why a mini-mod can be sold if it is called a DLC. Heck, I am surprised someone has not tried selling patches yet.

    Ignore Metacritic until they develop a properly balanced statistical system weighted for the correlative history of the reviewer (not the review group, but the individual to the average player’s scoring of the game, and then they will be worth something. Newbie reviewers are used often on ‘blockbusters’ where they give out 9′s instead of the player’s average of 6, so they have no correlative power with reality, and that is what we all want. A reviewer that will give us an honest review and let us know if the next EAware sequel is going be even worse than the last, or if they actually used the ‘good’ team this time.

    The best are the websites with no game company advertising and X-Play. While X-Play generally calls it like it is, and definitely play the whole game (while some internet reviewers apparently only play the first few hours – not directed at this site but many, many others). If X-Play gives it a 3/5 or lower I have never had reason to disagree (though I do find they give a 4 to some I would only give a 3). Find the reviews you trust and stick with them.


  2. [...] is often the quickest way toward gaining understanding. I addressed the review scale question in a previous article, but now let’s take a look at the other most frequently asked [...]

