Monday, November 29, 2010

Background
While many corpuses have been analyzed to create frequency tables of words for use in lexical and content analyses, there has been little done in the realm of user generated content (UGC) due to the significant variation in prose. However, to create more accurate processes to determine contextual sentiment in UGC, we believe that one must spend the time in understanding and creating a UGC corpus. Moreover, to apply accurate analyses to UGC that is limited in content length, such as found among Twitter users, one must begin with the Twitter lexicon.

Having collected more than 30 million Twitter statuses related to the video gaming market, we decided to analyze a segment representing nearly two-thirds of our corpus. Namely, those tweets dealing with game titles as their primary topic.

Methodology
Since UGC, in general, is characterized as having enormous lexical variation and micro bloggers’ are communicating in 140 character bursts with a proclivity to attach URLs and multiple hash tags, we analyzed several thousand individual statuses before proceeding with any data cleansing.

The first step was to allow for the use of single- and double-quotes in the escaped raw data, which we found were used quite frequently by our population. This effected 3.4 million of the statuses.

We ran several processes that targeted specific norms found in our user base by extracting all hash tags, at symbols (“@”), and urls. This allowed us to segregate the conversational content from the normal clutter while providing valuable insight into how much of this behavior is utilized in the given population.

We then created a word candidate frequency hash set, applying several filters to further clean the data. This allowed us to eliminate many lengthy word combinations, as we found most of these were of little contextual value. These process steps reduced our working dataset form 18 GB to 608 MB.

Having created a raw frequency dataset of 4.8 million word candidates representing, more than 259 million occurrences, we then removed all remaining non-alphanumeric characters, resulting in many duplicate words being exposed, as they may have been surrounded by any number of non-alphabetic characters. Upon inspection of the data, and running numerous elimination samples, we also decided to eliminate all numeric data at this time, as we found their continued inclusion not statistically meaningful. This resulted in reducing our word candidate frequency data to around 1.5 million.

We then manually inspected and processed the ~3600 candidates that had a frequency greater than 3000, combining like words, removing nonsensical strings (eg: “abababa”), and combining obvious slang to non-slang equivalents (eg. “willin” with “willing”). These combinations were only done for a handful of obvious words which typically had ratios of proper spelling to slang in excess of 4:1. This process was completed in three steps from f > 12,500, 5,500 < f < =12,500, and 3,000 < f < =5,500.

These manually processed datasets represented more than 110 million of the 136 million occurrences of our word candidates. As expected, the remaining 26 million occurrences resided in more than 2.1 million remaining word candidates.

All of the manually pre-processed frequency candidates were then combined forming a unique word set with a bit more than 80% of the total being represented by these ~3600 words. A final process that accumulated all of the remaining word candidate frequencies into their respective unique words yielded our final word count of 73,006.

We now have a very specific word frequency of our corpus for use in our sentiment analysis. We were very pleased to find that our initial run against a common adjectives dataset yielded a 94.6% hit rate, showing that our user base is more verbose, than not.

Partial Word Frequency Table

Some Simple Validation of Expected ValuesIn looking at the partial word frequency table above, we can walk through some examples that you would expect that data to support.

As all gamers know, Microsoft’s Halo franchise was and is a big hit. So, the word “halo” shows up 1,570,066 times. Well, is everyone talking about the original title still, or are they discussing Halo ODST, or Halo Reach? If we search our table for both the words “reach” and “odst”, we find 851,967 and 519,133 occurrences, respectively. Therefore, it is pretty safe to conclude that nearly 1.4 million of the 1.57 million times “halo” was mentioned (87.3%) they were talking about one or the other. In addition, it would appear that Halo reach was significantly more popular than Halo ODST.

Well, we did it, and finally got around to publishing it here. We think it's pretty cool. Have fun in drawing your own conclusions.

Thursday, November 25, 2010

With the much anticipated release by Sony Computer Entertainment of Polyphony Digital’s Gran Turismo 5 finally here, we are seeing some very interesting trends in your collective chatter.

We haven’t seen too many folks wanting to speculate on this title’s sales potential, so we figured we’d throw down the challenge. It appears that Gran Turismo 5 may give Microsoft’s Halo: Reach a run for its’ money this year.

Assuming that the reported numbers for Halo Reach are correct, more than 3 million units sold in 24 hours, generating more than $200 million in sales. and more than 4 million units in the first week, our models say watch out Halo Reach!

We are forecasting that Gran Turismo 5 will sell 2.3 - 2.5 million units worldwide day one, for more than $140 million. We also estimate that first week sales will see between 4.2 - 4.5 million units, for more than $250 million in sales. Will this installment in the series go on to reach the the levels of games past? Like many of you, we don’t think so. Maybe it will top out at 8 - 9 million units, or so, but we’re not so sure this genre has the same staying power as some others. That said, the PS3 has a very loyal fan base that may surprise us all!

Well, did they challenge Halo Reach? If pre-orders exceeded 1.6 million units, then they will match Halo Reach’s first day sales. Our little hedge against a three week delay in release. Regardless, the Gran Turismo franchise will certainly add to its lead over the Halo franchise, and both titles helped make 2010 a great year for gamers.

Let’s end with a mighty THANK YOU! and our best wishes to all of you and your families for a safe and happy holiday season! Enjoy your faves and keep on gaming!

Sunday, November 21, 2010

We know, we know...it’s not a fair comparison! They don’t really compete with each other, but we needed a title.

Assassin’s Creed: Brotherhood has really challenged our models. This title looks to be a smashing success, as we estimate first five days sales at 2.1-2.3 million units worldwide, generating more than $125 million. Moreover, we forecast that Brotherhood will finish this year selling 7.2-7.4 million units worldwide, raking in more than $432 million. Furthermore, we believe that this title has a very good chance of exceeding 10.2 million units within a year. Congrats to Ubisoft for another wonderful release!

Unfortunately, for EA, Need for Speed: Hot Pursuit may not deliver this Holiday season. We estimate the first five days sales at 1.1-1.3 million units worldwide, generating more than $65 million in sales. However, we believe that overall quarterly sales will accelerate more like an old, blue, Ford Escort, rather than a bright orange Lamborghini, forecasting between 2.8-3.2 million units worldwide, generating more than $160 million by EOY. This is quite a bit less than what Mike Hickey at Janco Partners has recently stated. While the headline reads 4.2 million units for the holiday season, he somehow discounts EA’s 4 million unit estimate by more than 20%, when suggesting $185 million for the quarter. At $60 a pop, his dollar estimate would yield only 3.08 million units. What’s up with that, Mike? EA will need to quickly roll out the marketing brilliance seen with Medal of Honor, to put this title into high gear. We actually hope we are wrong on this one.

While we missed forecasting Fallout: New Vegas’ debut, our model shows they probably did around 1.8 million in the first five days, as opposed to the 1.4 million units VGChartz and others were speculating. Granted, we are all just estimating figures here, but it will be interesting if our models have a 20-25% better accuracy rate than others’ methods. Then again, maybe we’ll be way off on some of these forecasts, and find ourselves taking it out on some unsuspecting n00bs in your favorite multi-player, FPS.

We predict that Black Ops will sell between 16.8-17.0 million units by EOY 2010, for more than $1 billion in worldwide sales. We expect Treyarch's blockbuster hit to sell more than 22.8 million units, surpassing $1.3 billion in worldwide sales by November, 2011. And, for you wall street types, don't forget we have been predicting margin expansion, to boot.

For the "and More" part of our title, we thought we would share a little more insight into what we are doing in our Analytics Lab with another chart.

These are just a few of the titles we have analyzed. We can certainly state without reservation that your collective voices truly represent the gaming industry. We appreciate your interest and game on!

Wednesday, November 10, 2010

Aggregame is getting a little pissed off about the politics involved in those that rate video games. While we certainly respect everyone's individual opinion, there has to be some common sense knocked into some of these so called critics. We're calling out Jim Sterling from Destructoid who gave a 6/10 for Call of Duty: Black Ops. Really Jim? Oh boy!

We actually agree with most of his assessment of the PC version of the game. However, as anyone in his position should know, the PC SKU represents far less than 10% of the folks that have, or will, buy Black Ops. Moreover, he waits until 42 other outlets that influence Metacritic's scoring, which incidentally yielded a 90 for Black Ops on 11/9, to publish his score. In his review, though, he states "Once the multiplayer is fixed, feel free to pop at least two more points, most likely even three, into this review's score."

As an obvious Modern Warfare 2 fanboy, when Infinity Ward 'essentially crapped all over the PC gamers' when they delivered a mediocre day one experience, he promptly provided it with a 9.5/10. If memory serves us correctly, the PC community was outraged by MW2 PC, IWNet, and the lack of Dedicated Servers. Jim must have evaluated the Xbox 360 version, eh? Or, could it be he just needs to try and drive the Black Ops Metacritic score below a 90 for his own agenda / publicity?

We all know the PC version of any game is the hardest to tune and get right due to the variations in target machines. We also know that Treyarch has a stellar record in "fixing" what isn't good on day one. And, they seem to do so pretty quickly. To presume anything else, or have it reflect your "score", is absurd. Hey Jim, how about giving Black Ops a 9 and dropping it 2 or 3 points if they don't fix anything? Is that too complicated for you?

Just be glad we're letting you off the hook this easily, Jim. We've since read your past COD franchise reviews, and could write a dissertation with all the contradictions and double-speak you portray from one year to the next. You are the truest practitioner of hypocrisy, and the furthest from a journalist we've ever seen.

If Jim's not a fanboy, and he did this for the attention - he got us. We took the bait and even sourced his articles. Doesn't matter; this needed to be said. Whether he's a fanboy, or just playing politics, Jim is what is wrong with the game industry media. He carries himself professionally like a 12 year-old carries himself on message boards. Shame on Destructoid for giving Jim a soapbox he can't handle.

Alright Alright, Sorry Jim.

Roasting Jim was fun, but we're actually calling out a lot more people than just Sterling. Jim just happened to in the wrong place at the wrong time during the most popular entertainment launch of the year (and possibly history). Though, he did put himself in this position. He was "the guy" that had the balls to do it to such a high profile title this year. It happens every year, and actually happened more than once this year with more than one title. There are always "journalists" who want to stand out from the crowd, and do what they think will turn the most heads. They are running a business like any other. Like FOX News and the Obama "Terrorist Fist Bump" stories. They sure got viewers, didn't they?

But it didn't make it right. Neither is what Jim (and others) are doing within this industry. How do you think Obama felt to have such ludicrous accusations brought against him by such an influential media powerhouse? Presidential candidate or not, that had to take a toll on him, his family, and his administration. The fact is, some people actually believed those stories, and cast judgement on Obama because of those silly stories. What Jim did here is the same thing, proportionally. How do you think the hundreds of men and women at Treyarch feel seeing a 6/10? They don't give a crap what Sterling says or does on a personal level, but the audience Sterling commands, on the other hand, is of great concern.

Jim can't defend his review score. He may try, but deep down inside he knows he didn't give Black Ops a fair shake. He's a smart guy, and knows how these things go. He knows there are day-1 and week-1 patches, he knows these issues will be addressed promptly. He knows that the way the PC SKU was developed and tuned, ran perfectly in all of the test machines. He knows that, performance bugs aside, Black Ops is at least a 9/10 (he admits it within his own article). Worst case, he's just a fanboy that will never let go. Best case, he was trying to make some righteous point; A dramatic valiant stand, but in doing so did more damage than it was worth. Most likely case, he knew it would get the most amount of attention and discussion, in which case he should apologize to Gamers, Destructoid, Treyarch, Activision, and the rest of the Industry for his insincerity.

Our ultimate message to Jim, and the other "journalists" in our industry who want to pull this kind of stunt, is to cut it out. If you just can't help yourselves, or really want to be that controversial guy, then remove yourself from the Metacritics and GameRankings of the world and stop polluting the other good standing game critics that are still left in the industry.

While there has been ample speculation from wall street analysts and gaming industry pundits regarding the level of success Activision’s Call of Duty: Black Ops may realize, we felt compelled to make our own bold predictions based upon your twitter chatter.

However, instead of playing it safe by saying “could be more than 11 million”, or “could exceed 18 million, but not match MW2”, or “estimating 7 million sold on day one”, we figured we would provide some rationale to our predictions and provide our thoughts on dollars and profits.

After all, what good is a wall street analyst if they can’t provide the latter two elements.

So, what we found is some pretty cool correlation between the amount of chatter over very specific time periods, the number of unique twitter ID’s, positive versus negative sentiment within the chatter, and sales. What?

In digging through a bunch of historical twitter chatter corresponding to previously released titles, with which we can find published sales data, we were able to construct some models that held up pretty well to our testing.

We also anticipate above normal margins being realized from these sales due to the number and mix of platforms supported, a suspected reduced/deferred royalty expense due to ongoing legal matters (could be a wash depending on reserves amount), and significant benefits from various partnerships.

Now, could the lofty 7 million units on day one that someone projected come true? Sure, depending on what and when you are counting. But, we believe that Activision will probably be rather conservative in their accounting and we base our estimates on the first 24 hours of sales (eg. through midnight ET on 11/10/10) for the Xbox, PS3, PC, and Wii. We actually suspect that the 7 million figure was met around 7pm ET on 11/12/10.

And, just to provide you with some pretty pictures, you can get a sense how your collective launch chatter reflects the relative success of video games. Granted, finding a useful visual representation of our data set always seems to be the biggest challenge!

In addition, we thought we would share with you how many of your voices were heard. Can you say, wow! While these numbers are very impressive, didn’t it seem like the whole world was talking about Black Ops?

Hickey has severely miscalculated his prediction. How are we so sure? Well let's start with some traditional methodologies: Black Ops has more pre-orders than Modern Warfare 2. By the numbers, on the 360 + PS3, Black Ops has 2.36 Million pre-orders as of the week ending October 30th. MW2 had 2.24 Million at the same time last year. Considering how much more positive Black Ops' PC outlook is than MW2's, we'll just assume Black Ops has higher pre-sales on that platform as well (although, VG Chartz stopped counting PC Pre-Orders for some reason). Also, on multiple occasions this year, various retailers made very ambitious statements regarding Black Ops setting records in both pace of pre-sales, and overall pre-order numbers. "Record Setting" translates directly to "Better than MW2" - Only more objectively, and less sensational. In other words, more sincere.

In light of the above information, what could Hickey's rationale possibly be? The weaker economy? If the economy really had an impact on a title like this, we'd have seen it reflected in the pre-sales...

Now, we're a Twitter-oriented site here, and much of what we base our predictions on comes from less traditional analysis of consumer behavior from Tweets. Later this week we'll reveal some brand-new, one-of-a-kind, can't-find-anywhere-else Twitter analytics to put the final nail in Mr. Hickey's coffin. We wanted to get it on the record before Black Ops launched, however, that we disagreed firmly with Mike Hickey's prediction.

If you're going to be an analyst, you need to A) Get it right, and B) Not just write what you think the media wants to hear. We all know it's fun to put the Modern Warfare's ahead of the other COD entries, and we all know how much the mainstream gaming press likes to regurgitate those "findings" - But it's just not the case this time around.

About Aggregame

Aggregame.com is a social media platform for Gamers. From user-submitted content, to gaming-news aggregates for news discovery; Aggregame.com serves as a one-stop shop for all things gaming. The innovative new AggreTweet.com is a great way to stay on top of the latest real-time social news as-it-breaks over Twitter.