Welcome to the Nindie Spotlight, your one-stop-shop for gaming opinions, previews, and reviews for anything Nintendo, but focused first on independent games and developers!

Tuesday, March 21, 2017

Editiorial: The Positives and Pitfalls of Review Democratization

The release of Legend of Zelda: Breath
of the Wild certainly brought out a lot of opinions. Many of them
were extremely positive, but there was obviously both a reflexive
backlash of “artificial” reviews (created with the express
purpose of simply trashing the game) as well as “reviews” whose
goal may not have been to objectively review the game but instead to
act as an expression of defiance for specific communities. While the
outright bogus reviews tended to be in the user community space,
disqualifying them from being counted in aggregation scores on
metacritic, what created a little more of a stir was that a small number of less-than-stellar reviews also got counted, bringing down the
overall average for the game in the all-time rankings. All of this
brings us to the point of this editorial, which is to explore where
things were at earlier points in time, where they are now, and then some
of the potentially difficult questions that may need to be asked
concerning what reviews (both good and bad BTW) merit being “counted”
at the end of the day.

Before getting too far into things lets
get one thing out of the way very quickly. All people are entitled to
their opinions, no matter how they may be formed or influenced, and
they even deserve for those opinions to be shared and read (whether
you decide to care may be a different matter). In reviewing games, in
particular, what qualifies someone to review a game has obviously
reached a pretty low bar. People no longer need a journalistic
platform to publish their thoughts on, reviews are solicited almost
everywhere and, again, that's fine. The trickier part of things is tied to
aggregation and whether you're interested in the raw mathematical
average or whether you want it to be set up in a way to at least
attempt to be accurate. If you read the content of a review you're
often able to quickly discern whether it is the ramblings of either a
fanboy or hater and you can pretty well throw those out on both ends. If all
you're dealing in is the final numbers that gets a bit hairier. Keep
in mind, as well, part of the reason how these scores are averaged
matters is because literally peoples' careers and corporate decisions
can be informed by them as repeatedly it has been noted some
publishers look at the metacritic scores to “grade” development
efforts. So this conversation is for more than just abstract interest
in accuracy.

Getting on to where we once were, back
in the stone age there were pretty well only mainstream print
publications to go by. You could check the scores in the latest EGM,
GamePro, Next Generation, and a host of others... and that was
roughly it. Sure, at some point word of mouth would get rolling but
the overall lack of variety in reviews made it tough, sometimes, to
be confident. Worse, these publications would sometimes print reviews
without attribution or they would be a product of collective opinion,
possibly robbing readers of getting an honest feel for a particular
reviewer's style or preferences. Even if you did know who wrote the
review since almost all publications would only post one review per
game even that connection could end up worthless if someone you
didn't know/like/trust was the one who did the review that month. In
short, looking back, going back to that era wouldn't be preferred.

Progress into the early internet age
and things began to get a little more interesting and served as a
sort of preview for where we are now. Independent game networks and fan
sites of various kinds began to crop up, some with more polish than
others, but the benefit was an increase in volume and diversity of opinion.
Especially in the early days most independent sites weren't getting
free games, the opinions being offered were from fellow gamers just like
anyone else who had spent their money and were going to share their
thoughts whether good or bad, in some ways improving their
authenticity. Of course sometimes this would come at the cost of
consistency or perspective so, in particular, fan sites could skew
heavily positive at times. But since most sites would post multiple
reviews you could at least have the advantage of diversity even within the
staff of a website and you could come to value the opinions of
specific reviewers as well.

This cascades into the modern day, and
the point where things get both complicated and a bit overwhelming.
There's just a load of opinion out there, plain and simple. Some of
it is still in the mold of the integrity exhibited in the classic
print publication space, whether private or professional, and a ton
of it isn't. The great thing is that this climate amounts to there
being reviews to fit all tastes and temperaments. If you can't find
someone with a review style and track record you generally agree with
you probably aren't looking hard enough. With the climate being what
it is perhaps if this is the case what you should be doing is writing
your own reviews to establish your own following... it really is just
about that crazy anymore. But that same diversity and craziness is
where the question begins to crop up with which reviews, at the end
of the day, deserve to be counted. This is where things can get a bit ugly.

We'll start with a reviewer who shall
not be named (if you read my opinion on the review it will be clear
why I'm not doing it) and a particular score given for Breath of the Wild that
did get counted. Again, I won't question the overall principle that
people are entitled to stating their opinion, but I also disagree
with that specific review score being counted. The complication with this type
of reviewer is that rather than being from a school of measured
thought first and foremost they're focused on the persona they
project to their fan base. The new-ish creation in this generation is
the “personality” reviewer, people with a shtick and a following
that is driven more by the adherence to that gimmick than necessarily
being accurate. In this specific case I'd argue that it wasn't brave
to score the game low, it was actually self-serving and a
manufactured score that would appease the rabid community who loves
contrarian and anti-establishment grandstanding while not scoring
things so low as to lose any hope of legitimacy. Let's face it, it
was a very “safe” score to give for all of the hub bub... bravery
would have been making it higher or lower.

To be fair, though, if we begin being
critical of the lower and possibly troller-ish end of the spectrum we
also need to take a hard look at the people who may be skewing things
up unnaturally, in the end doing just as much damage (if not more
since they're probably more prevalent). Are there really “legitimate”
reviews out there for 1-2-Switch that are over 8? Or even as high as
7? Really? This game would be considered worthy of what would be
considered a “passing grade” in peoples' traditional thoughts?
I'll probably write something else in the future about how value
versus purchase price really need to factor much further into modern
review scoring but at the point this game gets higher reviews the
legitimacy of them should be pretty severely questioned. This all
ultimately means you're combating a problem both from the bottom and
from the top.

That gets us into the last phase,
trying to determine what could and should be done to aid in making
the aggregated scores more “accurate”. Even among publications or
sites that are considered to be “legitimate” I think a strong
case could be made that rather than attempt to determine, as a whole,
whether to add/remove a site or reviewer across the board it would be
easiest and best to do what many places do when determining averages:
Throw out both the top x and bottom x (whether this should be a set
number or percentage, and what that number should be are up for
debate). It would, in theory, mostly balance itself out if all
reviews were 100% legitimate, doing no real harm, but it would likely
prevent severe outliers from skewing things up or down. If enough
people think a game is great or stinks obviously removing that number wouldn't change a thing, you're only removing individual reviews from
the average and if enough people agreed on a specific numeric score the majority of them would still stand.

Barring some other type of
standard adjustment of this kind the only option I would see possible
is, again, going to a need to continuously evaluating whether
individual reviewers or outlets should be considered legitimate,
pretty well an impossible task and one that would generate far more
controversy than it is worth (especially given the tendency of
“personality” reviewers to be a tad dramatic with a mentality of
“forget the games, just focus on me”). Also, though on any given
review an outlet or individual reviewer may skew up or down quite a
bit (I would have likely been eliminated back in the day because I
pretty well detested the original Tomb Raider games by likely 2 or
more points below the average) I would guess most of the time they
likely would have scores that are a little closer to the norm. I
understand that metacritic attempts to help with an adjustment based
on their “weighted” score of individual outlets but honestly that
method is even more prone to issues if one of their highly-weighted outlets turns in a skewed review somehow, making the problem
worse. Besides, don't you think if people found out how those weights were determined people would likely begin to nitpick even that? I know I probably would.

At the end of the day the fact that all
games currently scored by metacritic have all had the same criteria
applied to them (sort of) makes their aggregated scores "fair".
However, looking over the top 5 ranked games of all-time, the fact
that the most recent of them was from almost a decade ago likely
isn't a coincidence. The fact is that as more reviews are added to
the mix, especially considering the diverse voices that can be out
there, the more uncertain it is how you'll necessarily break out from
the pack. With those added voices and numbers also comes the
probability of baggage, both high and low, coming along for the ride,
further complicating getting an accurate picture of things other than
by hoping that the sheer force of numbers will help average things
out. But when you see a variance of 3 – 4 points or more from your
highest score to your lowest score perhaps something is up there
that, in the end, may not be worth counting. It may be silly to worry
over it but if the goal of an aggregated score is to be accurate,
this sort of adjustment would seem to at least be more honestly set
for meeting that goal.