Something went wrong. Try again later

Sunjammer

This user has not updated recently.

1177 408 28 39
Forum Posts Wiki Points Following Followers

That Halo 4 1-star review

I’m sort of a statistics idiot. I am endlessly fascinated by statistics, especially as a game developer, where they power everything from animation to scoring systems, but I also have a lot of love for what statistics can tell you about the world at a glance. It’s a comforting feeling to see reality described as a value.

Industry has long searched for reliable metrics on which to gauge success. The traditional metric is “how much did it sell and at what price versus our investment”, but this statistic is vividly dependent on time as a factor. How do you get a gauge of the success of a product when that product’s lifespan is considerable? The concept of the “indie darling” where a game turns out to be a success where little was expected is warm and fuzzy and fun to think about, but completely untenable as a reliable business model.

For analysts, prediction of success is a progressively sexy concept as investment increases. In video games, recently I was made aware of so-called “mock reviews”, a practice in which games journalists write game reviews for unfinished products for internal use by the publisher so as to predict how the game will score at final release, and, I assume, determine the marketing budget; as a game developer, how much real change can you introduce into a game at a point where the game is already “reviewable”?

Finally though, as a game is released, with the long tail of game sales these days, what determines success?

Unfortunately for everyone, consumer and industry, “success” is currently measured in the Metascore. It’s time for me to ramble.

Averages are wonderful in their need for reinterpretation. They are the most boring of statistics, existing only to get rudimentary ideas, remove edges, peaks and valleys. The average of a triangle’s vertices give you its midpoint; A representation of the triangle for certain, but what a pitiful representation. Such a representation only has value through interpretation and contextualization. Consider the average CPU use of an application. It might idle and do nothing, and it might burn every core you have, and so your average, out of context, is almost completely useless. You’ll be sat at that dull median, knowing even less than when you started. Averaging a system with lots of variation is, as far as I can tell, silly; The only thing you can measure is a tendency, and a tendency is not a precise value. The aggregate of a game would permanently be in the 80s or 70s. Only outlier games would diverge from this average

Video games media is about as score-driven as media comes. They are a natural fit for scoring, after all. In playing games, especially of yore, success is eminently quantifiable, and this quantification of reality and success is a big draw for a lot of gaming as a whole. It only makes sense, I suppose, to quantify the success of the game as a product as well.

It turns out, however, that scoring of this sort is a little too complicated for its own good; You might as well task yourself in reviewing a human being, what with all the warts and beauty a game can bring to the table. How would you score your friends, and how would they measure up?

The topic of game review scoring is a hot one. I suppose the basic argument is about what exactly a score means. Is it to determine whether the consumer should make a purchase or not? If so, why not adopt a binary metric such as the thumbs up/down of Ebert & Siskel? Is it to determine how the purchase stacks up to other purchases? At that point you are in the domain of averages, and you end up with lists sorted by score; For a long time The Legend of Zelda: Ocarina of Time was “the best game in the world”, which regardless of how you feel about that game is a patently ludicrous notion to anyone who have any interest in the full spectrum of experiences games can offer.

So the choice appears to be between simpler scoring – “good” or “bad”- and the more elaborate systems, often resulting in scores with decimals. While there are attempts at walking the middle ground between these two approaches, “guide” or “data”, these attempts seem to reduce scoring ranges in the faith that this implies legroom for error and as such should be less contentious. It’s a noble endeavor, but still a compromise rather than a solution to a problem that goes further than the individual scoring mechanic.

Recently, after reading Destructoid’s 10/10 (in actuality 100/100 counting decimals) review of Halo 4, I was struck by how offensive I found that scoring mechanic versus Giant Bomb’s range of 1-5 stars. The implication was, I felt, that even at 5/5 stars the broad strokes of the 5-star range meant there was implied room for flaw, whereas the 100/100 score was too precise to allow any doubt or reason, which are profoundly important to as subjective an art as video games. In a sense, the larger the range, the more I require the full range to be used, lest the values of that range boil away into a skewed average where none of it matters.

I couldn’t tell you when it happened, but at some point, video game scores became practically homogeneous. I’m not opposed to the idea that games themselves have become homogeneous; Look no further than the past decade’s love affair with the Modern Military Shooter, possibly the worst, blandest thing to happen to video games for as long as I have been playing them, though judging by the success of the genre that clearly puts me in the minority.

That games such as XCOM, a moderately simple turn based tactics game (a genre as common as oxygen in the 90s), can appear as rescuing angels of innovation in the year 2012 unfortunately speaks less to the merits of XCOM and more to the creative flatline of an industry where ballooning budgets and economic recession have put the fear of death into nary every publisher in town.

With such enormous budgets, yet so much fear, predicting success is, again, intensely attractive. If X is 100, and Y is like X, Y should be 100 as well, right? Let’s do another one of those Xes.

The answer, it appears, is to guard our investments with aggregate scores. I’m not inherently opposed to score aggregation. As a consumer, I find them highly useful. Rotten Tomatoes is a wonderful thing, probably one of my favorite sites today. It works mostly because film reviews work. While there are scoring systems in place for movies, Rotten Tomatoes does not average scores gathered but rather converts every score into a basic thumbs up or down, or fresh tomatoes vs rotten tomatoes. A gushing review or a middling-to-good review are both fresh; One does not skew the other. In the same sense, a vicious rage fest of a review versus a merely disappointed one count for the same.

The real crux of the problem with statitics and game reviews is publishers’ willingness to base their business off this skewed aggregate Metascore. I wasn’t shocked to hear Obsidian’s developers would not receive a bonus payout if Fallout New Vegas didn’t make 90% on Metacritic, but it didn’t make me any less furious, knowing the very first thing about averages and statistics.

Because averages are painfully sensitive to extreme values (the extremes of a data set is how you gauge the entirety of that data, were you to graph it for instance), so-called outliers will throw off entire ranges. Given 200 scores of 90, a single 10 might drag you down to 89 depending on your rounding. No bonus for you, developers! Why? Because a game reviewer dared have a vigorously divergent opinion.

Rotten Tomatoes have eliminated the outlier problem by normalizing the range into a set of binary values. In one fell swoop they have made a range that is intuitive to the viewer yet insensitive to the personality traits of scoring mechanisms or even reviewers themselves. The resulting percentile score is less a precise metric but rather the answer to a question: Out of how many reviewers, how many thought this movie was any good?

Metacritic instead embraces the whimsical granularity of the games press, adopting Destructoid’s to-me-problematic 100-point range, and as a result, outliers are a cause of great concern. The actual website is fine about it, presenting up front the highest scoring, the lowest scoring, and then someone from the mid range. As a consumer, looking through aggregated reviews, these are the ones I actually care about.

I am much more likely to read “bad” reviews of products simply because they tend to be the more impassioned. It is easier to disagree with a bad review than to disagree with a positive, though that might just be my personality that makes it so. Regardless, I look to outliers to gauge myself on that spectrum. Games are not as easily quantifiable as film; I’ve been burned much too many times on trusting the common consensus (Metal Gear Solid 4 is still the biggest piece of shit still in my collection, take that Metacritic average).

A range is only useful when every value on it has a meaning. Some outlets prize themselves on their willingness to apply the full range, while others take the more politically inoffensive approach of skewing the range towards the positive – everybody knows a game scored 6/10 is pure garbage, right? Combined with the games press’ love affair with granular statistics, this further devalues an average, as nobody seems capable of agreeing on what range they are operating, while quietly refusing to acknowledge their scores are being aggregated and used to drive the industry.

Sigh.

There are numerous further issues with Metacritic, such as their normalization of disparate ranges. For instance, a 1/5 translates to a 20/100, which is in conflict with sites that use the full 100-point range. I shudder to think how Metacritic would interpret a binary system.

Yet none of these issues with Metacritic as a platform would be affecting the industry if it hadn’t been for publisher analysts using the aggregate as a metric for success. Because it is not a metric for success. It is a statistical guesswork based on opinionated guesswork, normalized and processed and skewed by a conflicted press. It barely qualifies as statistics.

And so, Tom Chick’s 1/5 review of Halo 4, actually a good and informative read if a little personal, becomes controversial, with analysts and game developers up in arms about how he dares to write such “look-at-me journalism” (in the words of an enraged David Scott Jaffe) knowing the real-world “value” of the Metascore, or on the flipside, how Metacritic knowing the value of their metric dares include such outliers in their measurement.

For as long as Metacritic’s score average is taken so seriously and given such real-world implications, nobody wins. Not the press, not the developers, and certainly not the consumers.

60 Comments

61 Comments

Avatar image for little_socrates
Little_Socrates

5849

Forum Posts

1570

Wiki Points

0

Followers

Reviews: 16

User Lists: 23

Edited By Little_Socrates

@Sunjammer said:

Metal Gear Solid 4 is still the biggest piece of shit still in my collection, take that Metacritic average.

Dunno if I'd go THAT far, but it was an extreme disappointment and easily one of the most frustrating games this generation. Of course, I buy more games than most people; if you buy five or ten games a year, I could see MGS4 being the worst one.

EDIT: Shit! I forgot to respond to the question at hand! Brain fart!

Okay, so I agree that critics should be allowed their divergent and activist voices. And if 343 provoked such a violent negative reaction out of someone that they felt the need to drop the MC score by three points, maybe they shouldn't get their metascore bonus. But I hope Chick was absolutely, 100% conscious of what his review was going to do to that metascore and that he was probably going to affect the Halo 4 pay bonuses when he went with activism over objectivism. Obviously, objectively, Halo 4 is not a 1-star game. It functions, you can play it online, there's quite a bit of content, and it's technically very impressive. But he didn't just go with subjectivism, which would be a low score based on the fact that he didn't have any fun. He went with activism, complaining about changes that make the game rote, routine, and similar to every other game out there. I totally understand that choice, and if he understood the consequences he would inflict upon himself and 343 by doing so, then go with God.

But, let's just say I'm not about to trust Tom Chick to review a game through any lens other than activism down the line.

Avatar image for extomar
EXTomar

5047

Forum Posts

4

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Edited By EXTomar

As a side topic: I believe the process of "rewarding bonus on positive review scores" is a terrible thing. It promotes creating games and investing a bunch of money to manage a small and niche set of people instead of creating a game that they believe in that the masses come to appreciate. I see how developers and artists are enticed to work on challenging projects with bonuses is a good thing but it should be more rewarded on sales instead of review scores. Lets reward 343 for hitting 1 million and 2 million sales instead of whether or not Tom Chick or anyone else is happy with them.

Avatar image for geraltitude
GERALTITUDE

5991

Forum Posts

8980

Wiki Points

0

Followers

Reviews: 17

User Lists: 2

Edited By GERALTITUDE

@EXTomar: To some extent I agree (definitely about not rewarding for positive reviews), but how do you decide what good sales are? Enough to make a buck? More than the previous entry? And what if it's a brand new IP?

Doesn't rewarding sales just end up promoting mass appeal games like Madden and CoD and punishing good games that won't sell a lot like Ni No Kuni or even Darksiders II? And chasing dollars is still chasing dollars. Trying to make games that sell to the widest market possible is arguably the thought process that led to RE6 and the state of Modern Military Shooters.

Avatar image for nonapod
Nonapod

134

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Edited By Nonapod

The thing is, I don't believe there's enough evidence to even support the idea that higher Metacritic scores typically translate to higher sales. Someone did a little research on this a few years ago (only using vgchartz sales data unfortunately) and it seemed to indicate that there was no strong correlation between Metacritic scores and actual sales. Again, it's only vgchartz data so take it with a grain of salt, but I think it's silly for game publishers to assume a higher Metacritic score will automatically translate to more sales.

Avatar image for professoress
ProfessorEss

7962

Forum Posts

160

Wiki Points

0

Followers

Reviews: 0

User Lists: 11

Edited By ProfessorEss

@SomeJerk said:

The sooner publishers realize they're digging their own graves by listening to metacritic the better, bring on the one-star reviews.

I'll take it one step further. Publishers shouldn't even get as far as ignoring Metacritic. Go right to the source and straight up ignore the review process all together.

The fact that a team of highly skilled and educated professionals can be effected so powerfully by an internet pack of video game playing big mouths with little to no tangible credentials to be doing so totally blows my mind.

Avatar image for vinny_says
Vinny_Says

5913

Forum Posts

3345

Wiki Points

0

Followers

Reviews: 0

User Lists: 14

Edited By Vinny_Says

I thought sales were still the default measure for industry success. I mean look at THQ, some great games released and still that company is in the shitter.

Avatar image for doctorchimp
Doctorchimp

4190

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 2

Edited By Doctorchimp

@ProfessorEss said:

@SomeJerk said:

The sooner publishers realize they're digging their own graves by listening to metacritic the better, bring on the one-star reviews.

I'll take it one step further. Publishers shouldn't even get as far as ignoring Metacritic. Go right to the source and straight up ignore the review process all together.

The fact that a team of highly skilled and educated professionals can be effected so powerfully by an internet pack of video game playing big mouths with little to no tangible credentials to be doing so totally blows my mind.

Yeah jesus christ. I find it a little disheartening a team of creators and thinkers give two shits about what a website estimates what some smartass on the internet said.

Grow some balls videogame duders. Do you think Hollywood honestly gives two shits about rotten tomatoes?

Avatar image for hiharryarcher
HiHarryArcher

7

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Edited By HiHarryArcher

Interesting read. For what it's worth, the review behind the statistic provides food for thought.

Tom Chick's review:

"...The last few Halos seemed to be stretching their legs. The jazz bar melancholy of ODST’s ruined city and the last-stand grimness of Reach had character. They seemed to realize you can only go so far with a faceless dude in a suit of armor. But now we’re back to Master Chief doing all over again what he’s already done before. There is literally nothing that happens in Halo 4 that hasn’t happened in one of the earlier games."

Presently, this nails exactly how I feel about Halo 4. I don't particularly think Chick's being overly "look at me" with his reviews points, (perhaps the score is a little incredulous). Rather, it would seem that his personal experience with the game is very much similar to mine. I'm yet to find any great innovative leaps in Halo 4, it's just feels like a (lovingly) polished version of a pre-existing formula made to catch up with modern FPS expectations. For me, the game plays predictably and offers very little in the way of "new". It's all just a little boring, contrived and off-the-shelf manufactured. Perhaps a little too much attention was paid to the men featured in this article.

Also, they could have kept the Grunts funny. There's no need for the robotic voice 343i added.

Avatar image for extomar
EXTomar

5047

Forum Posts

4

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Edited By EXTomar

I am using the "sales" in very lose terms. Any modern game has away to measure how many people are playing it where they don't need to ask anyone else. There are other metrics they can use as well. Did they deliver the game on time? Did they deliver the game under budget? There are ways to reward that where the easiest is "You keep the money not spent".

Video games are a business and when a game is successful teams should share and relish the rewards. To me, managing Metacritic scores and subsequently tying bonuses to them doesn't improve anything about the game and seems like a giant waste of resources. So why do they do it? They seem to do it only because it is one of the few things they have to measure anything which is a pathetic way to look it.

Avatar image for antikythera
Antikythera

62

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 1

Edited By Antikythera

@fuzzypumpkin said:

@Antikythera said:

@Nightriff said:

MGS4 is one of the greatest games this generation of consoles. Period.

+1

+infinity.....I win

Thank you for your contribution.

Avatar image for steveurkel
steveurkel

1932

Forum Posts

0

Wiki Points

0

Followers

Reviews: 0

User Lists: 0

Edited By steveurkel
@Nightriff said:

MGS4 is one of the greatest games this generation of consoles. Period.

How does it feel to be wrong on all levels