Opinion: Measuring game quality
Game reviews used to serve one primary purpose. They were buyer's guides, providing the reader with a summary which informed them whether a game was worth handing over their cash for the proffered experience. A review typically concludes with a numerical assessment – but although it’s usually found at the end of the article, it’s often what players look to first.
Perhaps this is enough; If the game gets a high score and is in a genre the reader likes, it's an obvious purchase. Similarly, it’s useful for readers to know which games to avoid – there are plenty of games to choose from, so there’s no reason to buy games which deliver a poor player experience.
So far so good; this seems like sensible consumer advice. Increasingly, however, review scores are being used to decide whether or not a developer should be paid a bonus. Obsidian, developer of Fallout: New Vegas, was promised a bonus by publisher Bethesda if the game received 85 on Metacritic. It didn’t, missing its target by a single point.
I can, to an extent, see the publisher’s logic in tying bonuses to scores rather than sales. After all, linking a payout to a quality metric encourages contracted developers to put quality first. The major issue with Metacritic as a measure of quality, however, is that the algorithm which calculates the all-important score is kept under wraps.
But aren’t all game reviews based on hidden algorithms to a certain extent? After all, it’s impossible to know the internal state of a game reviewer’s mind as they were writing the review. Are they fans of the genre? How long did they play it before sitting down to write? Did they play the game themselves or base their review on someone else’s? You’ll never know.
You’d think the reviewer’s internal review process would be revealed by reading the review, but it seems even this causes some readers confusion. Look at Edge's Skullgirls review: several readers posted comments stating that they couldn’t see the relationship between the textual review and the corresponding numerical score. So how the reviewer calculated the score based on what they wrote was not transparent.
It’s not just the gap between reviewers and the public that’s causing confusion; even professional reviewers seldom agree. Dead Island currently has scores ranging from 30 to 93 on Metacritic, so is it terrible or incredible? What processes are professional reviewers using to assess games, and how can the results vary so much? It seems that bringing clarity to the review process might be useful, if we had better insight into how the scores were calculated, perhaps we could have more confidence in the final scores.
Objectifying the experience
One of the first commercial games review magazines was Computer And Video Games, first published in 1981, but a year before that a Stanford PhD student, Thomas Malone, wrote his thesis on what makes games captivating. He proposed three elements: challenge, fantasy and curiosity. Since then, these guidelines, or heuristics, have evolved and been used mainly by academics to help add some objectivity into evaluating the quality of the gameplay experience. There now exist sets of heuristics for evaluating game usability, gameplay experience, mobile games, or even specific genres of games. Every developer should look at these whilst making their game to help focus discussions among the team and result in a better game.
Although these game heuristics are useful for evaluating gameplay, or even comparing games, they don’t convey how it feels to actually play the game. And it’s here where the reviewer works their magic, manipulating prose to clearly communicate the actual gameplay experience. How are you likely to feel when playing this game? Will it move you emotionally, or challenge you? For these, most important criteria, perhaps numbers just don’t matter.