As a PhD student I acquire sustenance from the two major food groups: beer and coffee. I also like to pretend I’m an expert in those things, but, really, I’m probably just a snob about them.

But I’m not the only snob about these things. Dallas, clearly, is really into its beer. But a little less obvious is that Dallas appears to love its coffee much the same way — craft and/or local. In fact, the coffee scene in Dallas right now parallels the beer scene about 3-4 years ago: lots and lots of bars and restaurants catering to the craft beer enthusiasts (snobs), with a handful of breweries. We have countless places to get some great local or Texas-based coffee, and we have quite a few roasters: Cultivar, Noble Coyote, Oak Cliff Coffee Roasters, Full City Rooster, Novel, White Rock (and all the ones throughout the rest of the metroplex) with more on the way.

WHICH MEANS WE HAVE THE CHANCE TO GET NERDY — and statistically determine the best coffee places in Dallas. Like before, I aggregated the ratings from both Facebook and Yelp for all the local/independent coffee shops in Dallas. These primarily exist within the area created by 635, the Tollway, loop 12, and Bishop’s Arts. Let’s take a look:

So, let’s get to some facts first. The map above only includes 24 coffee shops within Dallas. Four of those shops just don’t have enough ratings to be analyzed (Serj, Black Forest, Urban Blend, Weekend). Another shop was excluded because people often say “Oh, that’s actually in Dallas?” — Coffee House Cafe — and yes it technically is. Dallas has an odd shape and city limits that make little sense. Coffee House Cafe is practically in the ‘burbs, and I’ll talk about those (and other) coffee shops some other time1. A few other shops have closed recently (e.g., Laguna’s). Some were not included because they’re more into something else than their coffee. And finally, I excluded roasters that don’t have a separate shop (e.g., Noble Coyote, Full City Rooster).

It wasn’t an easy list to cultivate in part because — we’re a lucky crowd with some great coffee options all through out town.

Now let’s get to one other fact about coffee places in Dallas: they are almost all entirely distinct from one another. Very few of these coffee shops have a lot in common with one another except that they generally use local or at least Texas roasted beans and/or are not part of a conglomerate. So let’s take a look at some of these coffee shop categories:

Some of the coffee shops are also bars (e.g., State St/Alcove, Ascension, Mudsmith)

Some are super nerdy (in the good way) about their coffee techniques (e.g., Method, Cultivar)

Some also focus quite a bit on food (e.g.,Oddfellows, Legal Grounds)

Some are actually simple coffee shops (e.g., Murray St., Café Silva)

Some are located in, or are bookstores (e.g., Black Forest, Serj)

And then there’s The Wild Detectives which is almost all of the above. Plus a place for dogs in sidecars.

In sum — there’s a coffee shop for nearly any personality or occasion in this town. Like I said — it looks like the beer scene from about four years ago. Enough background… it’s stats time.

It’s pretty obvious that 1 star and 2 star ratings are rarely if ever used. Which is awesome because we can just collapse 1, 2, and 3 star ratings into just the “3 star” category. We can pretend that 3 is “low”, 4 is “middle”, and 5 is “high”:

Now there’s something a little unfair here… some coffee shops have a ridiculous amount of ratings (i.e., Oddfellows) and some have a much smaller amount (e.g., Houndstooth). So let’s make these bars relative, that is, the total number of 5, 4, or [3, 2, 1] stars divided by the total number of ratings:

That’s just much easier to interpret, too! And it even looks kind of like the right answer. But it’s not and you’re a fool for believing it!

Ratings systems like this tend to be a bit flawed. For example, the movie “50 Shades of Grey” has 4.1 out of 10 stars on IMDB.com. Does that mean it’s generally receiving middle responses from most people?

In order to understand how people really perceive Dallas’s coffee shops we need to get fancy with our stats. So let’s turn to one of my favorite statistical methods: Correspondence Analysis (CA). CA is a technique that takes a large table made up of a bunch of variables (ratings) and turns them into new variables that better represent what’s happening2. CA produces new variables called “components” — which are the horizontal and vertical axes (lines) in the following pictures. The other really nice thing about CA is that it can handle data in a correct way when the number of items are different. Here, the number of ratings per coffee shop is quite different. Well, CA makes it so things are fair between all these coffee shops — kind of like the relative percentages above.

Correspondence Analysis is a nifty technique that finds for us a boundary. The boundary, called a simplex, is defined by the variables — in this case the ratings. All of the coffee shops have to live inside this simplex — which is the triangle in the prior and next few images. Let’s color the simplex by regions. This will help us understand how these coffee shops really rank:

Now, let’s pause for a moment. All of these coffee shops, judging by their average ratings, would get at least a B or B+. None of these shops are bad at all — they’re all good or great (see the bar charts above). But, with CA we’re going to see which coffee shops are more likely to receive 5 star ratings than others, which coffee shops are more likely to receive 4 star ratings than others, and which coffee shops are more likely to receive 3 star ratings than others.

Note that repeated sentence: “which coffee shops are more likely to receive [some number] star ratings than others” — that means this is a relative interpretation. A shop that is close to a 3 doesn’t mean it gets more 3s overall — just that, proportionally, it receives more 3s than other shops.

So, the above two image shows us that (more likely to receive) 5 stars shops are on the left side, (more likely to receive) 4 star shops to the upper right, and (more likely to receive) 3 star shops to the lower right. Let’s see how the shops are configured:

The purple dots are coffee shops. Let’s zoom in and now look at them, labeled with their average rating:

Anyways. With these ratings systems, they can still be informative. But they aren’t very informative when you just average the stars from a very broad and unrefined rating system.

In the picture above, we have 3 zones to describe our coffee shops: (1) The Red Zone is coffee shops that have relatively more 3 (and 2, and 1) star and 4 star ratings than other places, (2) The Orange Zone is the “50 Shades of Grey” zone — these coffee shops get their average rating from a bimodal distribution: People that love (5 star) the places and people that definitely don’t (3, 2 and 1 star ratings), and (3) The Purple Zone: these shops generally receive more 5 star and 4 star ratings, proportionally, than other shops.

Another small note: any coffee shops at the middle, where the horizontal and vertical lines cross, are essentially the average coffee shops.

So, which shops are these in all these weird 50 Shades of Grey zones and what not…?

The red zone shows us the coffee shops that are, essentially, a “B” or “B+” students. The orange zone: Davis St. and Method. So these two places have lovers and haters. But, in the case of Method — maybe that’s just because of their anti-Yelp leanings. They are essentially the A- students.

Now that purple zone is where we want to dive into. There appears to be two groups: the A students–closer to the origin–and the A+ students–the ones most to the left.

At this point you’re thinking “SHUT UP DEREK I’VE BEEN READING THIS FOR FAR TOO LONG TRYING TO FIND OUT WHICH COFFEE SHOP TO GO TO AND IT HAS DELAYED MY COFFEE CONSUMPTION AND THUS I AM IRRITATED AS IS EVIDENT THROUGH THE USE OF CAPITAL LETTERS RUN ON SENTENCES AND LACK OF PUNCTUATION.”

Well you’ll just have to wait, because I have something important to show you. And I’m going to show you through the power of an animated .gif. The .gif below shows us each coffee individually (purple dot) and how their ratings differ from Facebook ratings (blue dot) with an arrow towards their Yelp ratings (red dot):

Remember — the arrow points from Facebook to Yelp. What we can generally see is that, again, Yelpers are generally more negative than Facebookers when it comes to ratings, except in two cases: Mokah (#23 in .gif) and Café Silva (#24 in .gif).

Both Mokah and Café Silva have overall positive ratings (they’re A to A+ students here). But they’re the only two where the ratings are better on Yelp than Facebook — completely counter to every other shop. And I even made sure to grab the hidden ratings from Yelp.

So how can we rank these coffee shops and give them a new rating? Well, that’s where a classic statistical technique comes in: linear regression.

All of the coffee shops will now get a new rating. This new rating is computed by using the original overall rating from above as the dependent variable, where the positions of the coffee shops from Correspondence Analysis3 are used as predictors4.

So, let’s get down to the important question: what are Dallas’s top 5 coffee shops, and what are their new ratings?

Stupid Good — 4.75

The Wild Detectives — 4.68

Cultivar — 4.65

Café Silva — 4.63

Flying Horse — 4.61

So, where are they?

.

Now back to all that distinctness between shops — you really couldn’t ask for a more diverse set of coffee shops to be the top 5 — all have a unique personality, relative unique locations, wide array of coffee beans (including 3 local roasters: Oak Cliff at Stupid Good and The Wild Detectives, Noble Coyote at Café Silva, and Cultivar at Cultivar).

Given how far apart these places are, now we can answer a bonus question: Which neighborhood has the best coffee? That is, if you had to be trapped in a particular neighborhood in Dallas, and the primary condition is that you just need to be surrounded by great coffee shops, where should that be?

Downtown. That’s right… Downtown. That sea of green surrounding Downtown (and parts of Uptown) mean that’s the best place for you to be trapped.

It’s pretty much one of the most boring neighborhoods–where everything is closed tightly by 5pm–is actually the best neighborhood for coffee. Go figure.

And the final re-rankings of coffee shops in Dallas:

Coffee Shop

Rating

Stupid Good

4.8

Wild Detectives

4.7

Cultivar

4.7

Cafe Silva

4.6

Flying Horse

4.6

Union

4.6

Method

4.5

Sip Stir

4.5

Davis St.

4.5

Oak Lawn

4.4

Alcove

4.3

Opening Bell

4.3

Mokah

4.3

Mudsmith

4.3

White Rock

4.3

Crooked Tree

4.3

Ascension

4.3

Houndstooth

4.2

Espumoso

4.1

Murray St.

4.0

Drip

4.0

Oddfellows

4.0

Legal Grounds

3.9

Lil’ White Rock

3.9

All analyses performed in R. Correspondence Analysis was performed with the ExPosition package – a package created by particularly attractive and smart people. Maps were created in R with the RgoogleMaps and MASS packages. Some code was borrowed and adapted from Everyday Analytics and Stackoverflow.

Footnotes:1There are some great shops outside of Dallas: Avoca and Brewed in Ft. Worth, a few Buon Giorno locations, Generator in Garland, Pearl Cup in Richardson… the list goes on.

2For the stats nerds, technically both the coffee shops and the ratings are variables. The observations (people making ratings) are kind of hiding. Each person simply helps increase the number of responses within a particular cell of this table. CA is analogous a principal components analysis but for data more suited for χ2 analyses.

3These are called “Component Scores” or “Factor Scores”.

4For the stats nerds: one lovely property here is that the components (axes, lines) are orthogonal, which makes for an easy regression! Furthermore, this is a components-based analysis where the components are used as predictors in a simple regression… You may be more familiar with this under a different name (with a different technique): Principal Components Regression.

You might be thinking: “Derek. That’s a stupid question that doesn’t require science. Given the abundance of overpriced Miller Lite at Jerry’s Dome of Eminent Domain, the answer is MillerCoors (which is produced by smashing frosty-cold bullet trains into mountains)”. While it might be technically correct (based on sales) that’s just gross and you should feel gross for having such gross thoughts.

Unlike before — I’m not going to tell you upfront which breweries are the best. You’re going to have to get nerdy with me (or just scroll to the end). So let’s continue the tradition1!

So how can we determine which of DFW’s breweries are the best? Well, you might be thinking, “don’t we have the (much reviled) Yelp average ratings?” or “I gave it 5 stars on Facebook so it’s clearly the best.”. Yeah, sure. If you go to Facebook, you see how many people rate Lakewood Brewing Company with 5, 4, 3, 2, or 1 star. You can do the same with Yelp, but you need to make sure to go find the hidden ratings, too. So, for this venture into stats and beer nerdery, I aggregated all the ratings from Yelp and from Facebook for all the DFW area craft breweries2. This gives me a count of, for example, how many 5 star ratings a brewery has (per platform: Facebook or Yelp).

Before we go on, let’s get something quite obvious out of the way. The 5 star all-purpose rating system is… flawed. In fact, these types of systems are usually despised. It’s prettywelldocumented, especially here in DFW, that ratings systems need to be more elaborate — rating different aspects of something, instead of an all-purpose feel-goodery star system (as if it were kindergarten and you didn’t knock the blocks down today — 5 stars for not being a clumsy 4 year old).

So the average rating might be quite unfair for these breweries. Are people giving stars because they are architecture nerds and love the actual building? Was it the tour? General opinion on all the beers? Who knows. What we do know is that the 5 star all-purpose feel-goodery system is flawed. And some businesses are very anti-Yelp because of this all-purpose feel-goodery star system.

Sometimes, when averaged together, the stars tell you just enough. But when it comes to these breweries, as we’ll see, the average tells you very little. However, when we take a closer look — the distribution of stars speaks volumes. Let’s begin with just looking at the frequency of ratings for all the DFW breweries. We’ll also sort them (top to bottom) the total number of ratings per brewery, with “average stars” on the right:

Here, we can see that Rahr & Sons and Deep Ellum Brewing have the most overall ratings in DFW. So, let’s sort this by average rating (average of Facebook & Yelp):

From the looks of both of these pictures, it really seems as though 3, 2, and 1 star ratings are rarely, if ever, used. This suggests that, for the most part, when people rate these breweries 5 means “Great”, 4 means “Good” and anything else means “Relatively Unsatisfactory”. So from here on out, I’m going to combine 3, 2, and 1 star ratings into a single category of “Not Good”.

But that still feels weird, so let’s look at things proportionally: that is, the percentage of ratings for each brewery:

From the looks of this, you’d probably think Peticolas is DFW’s favorite brewery. And then I would kindly interject and say “Your thought lacks science and is thus far incorrect!”.

When we look at these ratings, we’ve probably noticed right away that all ratings exist between 4.45 and 4.8. In fact, 7 different breweries have ratings between 4.63 and 4.67. So if we go just by average ratings on a (fictitiously) 5 point all-purpose feel-goodery kindergarten star scale — we’d conclude “they’re all pretty good so let’s go party.”.

So, how can we figure out which brewery really is the best? And how can we do that when the number of overall ratings are so different between breweries? By now you’re thinking the answer to that is “Science, duh”. So let’s science.

The data here look something like this:

Brewery

5 Stars

4 Stars

3, 2, or 1 star

903

289

40

28

…

…

…

…

Rahr & Sons

2690

726

277

where each row is a brewery, and under the ratings columns, are the total number of stars from both Facebook and Yelp3. One of the best ways to analyze this type of data is with Correspondence Analysis (CA). If you’re not into stats, avert your eyes for a moment…

For the stats nerds: CA is a technique that takes a large table made up of a counts, and finds the best overall representations of these counts. Like PCA, CA produces components. These components explain the maximum possible variance in descending order. But these components are derived under χ2 assumptions. However, CA—unlike other techniques—takes into account the total number of ratings (which is different for each brewery). That means we can more fairly analyze the ratings, even when the overall number of ratings is very different for each brewery. In this application of CA, we’re going to use the asymmetric version — where the columns are privileged. The privilege here is that we want the columns to define a maximum possible boundary of where the breweries can go. This is called a simplex.

Back to beer business. So, with some statistical magic, let’s start to find out which breweries can lay claim to being the best. First, let’s look at the ratings:

The configuration of ratings here defines a boundary, that can be broken into regions:

Those regions reflect 3 different traits of how a brewery receives it’s “average” rating. The purple region is due to breweries that pretty much get 5s and 4s. The orange region is due to breweries that get 5s and {3, 2, 1} ratings. And finally, the red region is due to breweries that are more associated with 4s, and {3, 2, 1} ratings than the other breweries. So let’s put the breweries in:

All those purple dots are the breweries. Note how close they are to “5 stars”. Let’s pause a moment. We can already assume that the average ratings-type system is flawed — people love to love their favorite things. Because the 5s are being used a little too much, we can’t figure out which breweries are really the best just by average. We need to use the other ratings to find this out. Let’s pretty that last picture up a bit.

A little better. Now we can see the breweries’ logos and where they fall in these boundaries. If you’re here for beer… avert your eyes again.

For the R nerds: I searched high and low for a way to plot raster graphics onto a plot device. I found no obvious and simple way to do this (but plenty of advice on how to put a plot device on a raster image — painfully unhelpful). My current solution (pictured above and below) exists somewhere between “Neat trick” and “Disgusting hack”. See the attached code in the footnotes.

Back to beer business. Let’s zoom in on this area, which has all the breweries:

And bring back our magical boundaries:

Oh man we are about to get scienced. Remember: all these breweries have a ridiculous amount of 5 star ratings. What’s important for figuring out which breweries are the best are the not-5 stars and how the stars are distributed. Instead of asking “which breweries get loved on the most?”, we’re really asking: “which breweries get hated on the least?”. Also remember that the red area means that these breweries get their average ratings from a higher number of 4, and {3, 2, 1} ratings than any other breweries. While beloved, Deep Ellum, Firewheel, Cobra, and Community get hated on the most. But 903, Cedar Creek, and Grapevine live in the “love-hate” zone — they have their lovers giving them 5s and their haters giving them {3, 2, 1} ratings. Here in the orange “love-hate” zone there is no middle ground: these breweries are less likely to get a 4 star rating than the other breweries. That purple zone, though… that’s what we care about.

So now we know that the purple zone is, generally, the “zone of favored breweries” in DFW. But exactly which breweries are the best?… We’re so close to the big reveal. So close. Before the big reveal, let’s look at the breweries, but marked with their average ratings:

Now that’s fancy. Science just told us that not all 4.6whatevers are created equal! 903 and Grapevine’s “4.64” is because they have lots of 5s, but those 5s get dragged down by the {3, 2, 1}s, where as Martin House’s 4.64 has its 5s dragged down by 4s! Making Martin House the best damn 4.64 in DFW! Likewise, Cedar Creek and Rahr & Son’s 4.63s are different: Rahr’s 4.63 is the best damn 4.63 in DFW!

Now that we can see a lot more of what’s going on — let’s take a look at just those top ratings: Peticolas (4.80), Revolver (4.79), Rabbit Hole (4.76), and Franconia (4.72). With Correspondence Analysis (CA) — we can think of the dots for the star ratings (5, 4, {3, 2, 1}) as pulling the breweries towards their “star position” (in CA the terminology is “inertia” because we can think of this as a gravitational pull)4. So which star ratings are pulling which breweries towards them?

While Peticolas and Rabbit Hole are being pulled by 5 star ratings — they’re also getting pulled back towards the {3, 2, 1}s. While there’s no doubt that these are some of DFW’s favorite breweries — they are not, according to (my analysis of) Facebook and Yelp (ratings), #1 nor #2. Rabbit Hole is #4 and Peticolas is #3.

And then there were two. To find out the #2 and #1 breweries in DFW, we need to get extra nerdy: Facebook ratings vs. Yelp ratings.

First off — most of the ratings from this analysis come from Facebook. There is a disproportionately high amount of them there as opposed to Yelp. However there is something quite insightful on how these ratings relate to the overall analysis:

Facebook ratings are generally very positive and include even more 5 star ratings. Note how in the figure on the left, that the blue Facebook dots are being pulled towards the 5 star ratings. Then look at the figure on the right. And then notice how far away all the Yelp ratings are. This would suggest an anecdote most of us are probably well aware of: Yelpers are mean-spirited jerks (or, rather, just tend to more negatively rate things).

This is actually really important to note: Facebook ratings are overly positive while Yelp ratings are overly negative. Now, there’s a bit of additional unfairness here… Franconia has no (business) Facebook page. That means, it has no ratings from Facebook to help it out. Let’s look at one more picture: how Franconia and Revolver stack up on Yelp (with respect to their aggregated results):

From Yelp’s perspective, Franconia is closer to the 5 stars than Revolver. Revolver is getting pulled closer to the 4 star ratings. And given that we now know that Yelp ratings are generally more negative than Facebook we have but one conclusion:

Revolver is #2, and Franconia is DFW’s #1 brewery (based on two of the ubiquitous 5 star rating systems available).

But it’s quite important to remember: we have no idea why people are rating these breweries as they do5, simply that—when it comes down to ratings—Franconia gets lots of 5s and 4s, and very, very, very few {3, 2, 1} star ratings.

1 I don’t think 2 blog posts counts as “tradition” yet.2 Some breweries don’t have any ratings, and some have just a few, so they’ve been unfortunately excluded.3 Some breweries only have ratings on Facebook and some only on Yelp.4 I just rewatched Guardians of the Galaxy and Star Wars (in Machete Order) and am really emphasizing “star systems” and “star positions”. Space operas are the best.5 For the stats nerds: there is actually another problem hiding here. Not all ratings are necessarily independent. In fact, it’s not unlikely that the same person provides a rating on both Facebook and Yelp. So, yes, there are some statistical assumptions that have been violated. But this is what happens sometimes — just do the best you can.

I didn’t present at this year’s Society for Neuroscience (but was an author on a talk). But I did go to SfN for two reasons: (1) Networking and (2) an informal survey of “MVPA”.

In the context of neuroimaging, what is “MVPA”?

Well, MVPA stands for Multi-voxel pattern analysis. Or Multivariate pattern analysis. So what do those mean? Let’s break the terms down a bit. Each have “pattern analysis” in them. Pattern analysis typically involves some sort of statistical analysis of patterns — where patterns are defined as a set of traits, features, or variables to describe a whole bunch of observations.

Sometimes, these patterns are used as the basis for separating different (often known a priori) groups of observations. Other times it is for finding ways to group observations together based on common patterns.

Pattern analysis (PA) is implicitly multivariate. Thereby making one of the MVPAs—Multivariate Pattern Analysis—redundant in title.

Multivariate means that multiple dependent variables are modeled or analyzed in one go, as opposed to conducting many, many univariate tests. I’m stealing a quote from Haxby (2011, link) that succinctly gets to the advantages of multivariate:

MVP analysis can detect the features that underlie these representational distinctions at both the coarse and fine spatial scales, where as conventional univariate analyses are only sensitive to the coarse spatial scale topographies.

or…

With multivariate analysis, you’ll get a similar (or the same) perspective as univariate approaches — but now with the added bonus of a unique perspective only multivariate approaches can give you.

Multi-voxel pattern analysis, though, is where things can get confusing. Multi-voxel does not imply multivariate — rather it is explicit with it’s title: a whole bunch of voxels used in some way. Some of these methods are, for example, ridge regression. But… ridge regression is a univariate method. Thereby making the other MVPA—multi-voxel pattern analysis—sometimes contradictory in title.

There are some fantastic reviews on “MVPA” and multivariate analyses and pattern analyses for fMRI1, 2, 3, 4, so I won’t go into detail yet on what MVPA should be, but in general it is understood as (1) classification methods, (2) multivariate methods, or (3) a combination of the two.

So I “took to the streets”, if you will5, and conducted a small scale survey of what “MVPA” means to neuroimagers. I went to as many posters and talks that explicitly used the term “MVPA” or happened to just stumble across them through the vast oceans of the poster section. I would then take note of exactly which technique was used. However, in most cases (for posters) exactly which technique used was never explicitly written; rather only the 4 letters: MVPA. I would often have to ask “Which MVPA are you using?” (which in most cases was my sole question — and for that I probably seemed like a crazy person). Here’s what I found, broken down into 3 categories. Quotes are used to paraphrase responses.

Category 1: The Definitely Multivariate:

Support Vector Machines

“Multivariate pattern similarity analysis” (MPSA)

Representational Similarity Analysis (RSA)

I’d like to note that MPSA is RSA are the same technique, but now falls under two (unnecessarily different) names, because RSA is just multidimensional scaling (MDS; which means it’s three unnecessarily different names). This SfN is the first time I’d ever seen “MPSA” used in the imaging context.

I do find the phrase “Haxby style correlations” quite delightful. Why am I separating these techniques from the above? Well, these techniques usually rely on aggregating results from a series of univariate analyses. The aggregation usually happens across voxels.

Before we move on to the third and most hilarious (or upsetting) category, I have a small aside: I couldn’t find any case of regularization performed correctly. Regularization is a nifty technique. Usually, regularization is a helpful method when your sample is too small to properly estimate all of your variables. So, the nifty-ness comes in by artificially inflating particular values to, essentially, pretend you have a bigger sample size. To quote Takane: the inflation of these values “works almost magically to provide estimates that are more stable than the ordinary [Least Squares] estimates.”

But, there is a danger to inflating: overfitting. Which is why in regularization methods — you have to search for the regularization parameter that is a compromise between a more stable solution and not overfitting. Often, this is done through a train-test paradigm like k-folds.

At SfN, I found only the following case: a single arbitrarily chosen regularization parameter. Tikhonov would be furious.

Category 3: The Definitely Concerning (and ambiguously ambiguous)

“Regularized regression”

“MVPA Regression”

“The MVPA toolbox”

I would follow up with something along the lines of “Do you happen to know which type of analysis?”, to which the response was usually just “The MVPA Toolbox”. I didn’t bother asking which MVPA toolbox.

At this point, you’re probably thinking: “Derek,

what you’ve just said is one of the most insanely idiotic things I’ve ever heard. At no point in your rambling, incoherent response was there anything that could even be considered a rational thought. Everyone [on this internet] is now dumber for having [read] it.

And you’re right. This post is merely a spewing of complaints with no apparent direction nor solution. However, it will be the first in a series of posts over the coming months. There will be two types of posts: (1) examples of multivariate and similarly exotic neuroimaging analyses, in the hopes that (2) some sort of taxonomic structure can be derived — essentially a family tree of “MVPA” with the hopes that, some day, we can stop using the those 4 letters in that particular sequence. So let’s hope I turn this complaint into something more useful!

Now that we’re hot off of 2014’s North Texas Beer Week… Have you ever wondered what Dallas’s favorite local craft beer is? You’re probably thinking “Yeah, it’s clearly Lone Star because it’s the ‘National Beer of Texas'”, or “duh – it’s the one in my hand right now, bro!”.

While valid guesses, they are clearly not correct (and you should feel bad about those guesses). The correct answer is: Lakewood Brewing Company’s “Temptress” – a milk stout. Now Dallas – you’re probably now thinking “Well, Lone Star and the beer in my hand are clearly the second and third favorite local craft beers.”

Well… this is the point where I ask you to stop thinking such terrible thoughts – those answers are also not correct (and you should continue to feel bad). The correct answers are: Peticolas Brewing Company’s “Velvet Hammer” — an imperial red — and Community Beer Company’s “Mosaic IPA” — an American-style IPA.

How do I know that Temptress, Velvet Hammer, and Mosaic IPA—in that order—are Dallas’s three favorite beers? These beers are on tap, or (for Temptress and Mosaic IPA) on shelves all across town. But just being available doesn’t make a beer Dallas’s favorite – or else those truly wretched thoughts you were having about Lone Star would have been true.

Well as a beer nerd and a stats nerd, I decided I just had to know: of all the local craft beers that are now produced and available throughout DFW – which are Dallas’s favorites? Let’s get nerdy.

I created a relatively simple survey on Google Docs. This survey listed 35 beers produced in (the broader) DFW area. For a beer to get on the list it had to meet the following criteria:

The brewery itself must have been in operation for at least 1 year

The beer itself must have been available for at least the past six months

33 People professionally work with beer (brewer, bartender, waitstaff, etc…).

58 People consider themselves homebrewers.

The survey asked people to respond to each beer with one of the 6 following options 3:

It is one of my favorite beers.

I like this beer.

This beer is OK.

I don’t like this beer.

I’ve never had this beer.

I have no opinion.

At this point, we can just count how many people, out of 201, had the answers above for each of the beers in the survey. So let’s get down to it:

What we’re looking at here are the beers (on the rows, listed vertically) and the proportion (out of 201) for each response. I reordered the beers so that they are listed from the beers with the most to the least “It is one of my favorite beers”.

There are some clear favorites: Temptress, Velvet Hammer, Revolver’s Blood & Honey, and Mosiac IPA all have a lot of “Favorite” responses. You might be thinking, “Yo, Derek, you didn’t say a thing about Blood & Honey before—that’s my go to crushable—so maybe you’re lying about Lone Star too?”. If I were inclined to respond to such accusations, I’d say that 1) I’m building suspense (or boring you to tears) and 2) I’ve grown really tired of you talking about Lone Star – but I’m above that so I won’t say it.

As a stats nerd, though, this picture feels a bit… rudimentary. There are better ways to figure out and visualize Dallas’s favorite beer. So let’s turn to one of my favorite statistical methods: Correspondence Analysis (CA). CA is a technique that takes a large table made up of a bunch of variables (here: the responses) and turns them into new variables that better represent what’s happening4.

The data from above looks something like this:

Beer

FAVORITE

LIKE

OK

DO NOT LIKE

Never Had

No Opinion

Lakewood Temptress

107

62

7

4

19

2

Four Corner’s Block Party

15

83

29

6

66

2

So what will CA do for us with a table of data like this? It tells us which beers are most similar to one another – based on all the different categories. It can also tell us if any of the categories are similar to one another, too. Most importantly, it tells us which beers are more related to responses than other beers. Let’s take a look at what a CA would produce:

CA produces for us these new variables—these variables are called “components”—denoted by the axes (horizontal and vertical lines) in these pictures. There are 3 other axes besides these – but those aren’t very important. Just these first two explain 87% of the entire data.

With what we know about CA we can say some of the following:

Temptress, Velvet Hammer, Blood & Honey, and Mosaic are more associated with “A FAVORITE” than other beers (both figures)

The responses of “OK” and “DO NOT LIKE” are essentially the same – which probably means people are being nice when they say “OK” or they’re being mean when they say “DO NOT LIKE”.

The lower left of the left figure shows Cedar Creek Scruffy’s, Cedar Creek Elliot’s Phoned Home, and Martin House XPA – which means they are nearly identical based on their responses; the responses being that most people haven’t had these beers. Sad times.

Let’s go a bit further. We know a bit about this data to, perhaps, make it easier to understand. Let’s combine “OK” with “DO NOT LIKE” – because they are basically one in the same here. We’ll also combine “NO OPINION” with “NEVER HAD” – so that we can group together the responses that are basically non-responses. Let’s do another CA and this time color each beer by the responses they are most similar to.

With the combined responses – we can see the general configuration is essentially the same. Except this time we can explain 92.5% of the data instead of just 87% (take that Lone Star!). It’s also a little clearer that from right to left is a gradient of liking (or ever having) a beer. Now let’s take a look at the beers, colored by which response they are most similar to:

Now we have a much clearer idea of which beers people have never really had (in gray), which ones are not particularly cared for (in red), which ones are liked (in yellow), and which are Dallas’s favorites (in green).

The favorites are still Temptress, Velvet Hammer, Blood & Honey, and Mosaic. So why did I exclude poor ol’ Blood & Honey from the top 3? Let’s take a look at the responses in these 4 categories like we did initially. Beers are sorted by those with the most “A FAVORITE” responses:

Let’s also look at beers sorted by fewest responses of “OK/DO NOT LIKE”:

Now we have a bit different of a perspective – one that we can also get directly out of the CA results. Some beers are very related to “A FAVORITE” while at the same time rarely ever get a “DO NOT LIKE”. Unfortunately for Blood & Honey – the responses for “A FAVORITE”, “LIKE”, and “DO NOT LIKE” are equally likely.

But for Temptress, Velvet Hammer, and Mosaic IPA – very few people would say they “DO NOT LIKE” these beers. Thus, these three beers—in that order—are Dallas’s favorite beers. And that’s just science.

So what’s next? In about a year I’ll try to re-do this survey. That’s because by then approximately 30,786 breweries are, apparently, going to be open in Dallas (thanks, urban sprawl!), and many of the breweries that are currently open—but didn’t qualify this time—will qualify in a year.

All analyses performed in R. Correspondence Analysis was performed with the ExPosition package – a package created by particularly attractive and smart people. Data available here5. Code to recreate these analyses here6.

I’m tired just writing this and I’m sure you’re tired just reading it. So let’s go get some Lone Stars.

Footnotes1 I had only realized after I sent out the survey I had made 2 glaring errors. I mistakenly excluded Firewheel and Armadillo Ale Works. Woops – sorry!

2They responded with “I’ve never had this beer” to all beers.

3For the stats nerds: these are survey options not usually seen. Often times when you get a survey, you’re asked to respond with a 1, 2, 3, 4, or 5 (or some similar numeric scale). Well, what if people have no opinion? What if they don’t want to answer the question? They need a way to opt out. Also, categories aren’t numbers, you dummy! For your (statistical) health!

4 For the stats nerds, technically both the beers and the responses are variables. The observations (people) are kind of hiding. Each person simply helps increase the number of responses within a particular cell of this table. CA is analogous a principal components analysis but for data more suited for χ2 analyses.

5Some responses are decimals. This is because some people left their responses blank (instead of choosing the very comprehensive categories I outlined – jerks). When a response was blank, I just replaced it with the average response.

6It’s in a text file, but, change the extension to .R to use it more easily with R.