Gizoogle: Amusing tribute or racist caricature? – NSFW (part 2)

In my last post, I took up the question of whether the website Gizoogle (see the previous post for an explanation of it) is an amusing tribute to hip-hop artist Snoop Dogg, a racist caricature of African American English (AAE), or perhaps something in between. In that post, I discussed the issue from the point of view of the website’s creators, specifically what their stated intentions were or what the website means to them. In this post, I take up the question of what the website actually does, what the language it produces is like and especially how accurate it is as a representation of either Snoop Dogg’s speech or AAE more generally.

Before I get into the analysis, I wanted to briefly mention why it would be important to look at the language itself (as opposed to just looking at what people said they thought it meant). Jane Hill, an anthropologist at University of Arizona, has written extensively about a language phenomenon she calls “Mock Spanish“. Mock Spanish is basically the use of Spanish by non-Spanish speaking non-Latino/as. In other words, it is used chiefly by white people who would not usually describe themselves as Spanish speakers. In fact, Mock Spanish often is really not Spanish at all but simply uses aspects of Spanish like the word “el” (similar to “the”) and the ending “-o”. This is illustrated in the phrase “el cheapo” the name of a chain of gas stations such as the one pictured below.

Rather than being a respectful, accommodating use of Spanish, Mock Spanish makes little attempt to conform to the general rules of Spanish and instead is usually perceived as “funny” or “a joke” by users (including the blog from which I borrowed the picture above). The only way the joke makes much sense, however, is if we assume that Spanish is an inherently funny or silly language or worse (but probably more accurately) that Spanish speakers are inherently funny or silly. Otherwise, it’s simply really poorly used Spanish which in other contexts would be shameful. Compare the laughter at the use of terrible Spanish, in which users seem to delight in the fact that the rules of Spanish grammar are ignored (or trivialized), with the mocking of English translations on signs particularly in Asian countries. Photos of such signs have often been dubbed “Engrish” and appear on this website. An example appears below.

The difference in humor should be apparent. Whereas the users of Mock Spanish are not being chastised for their (intentionally) poor use of Spanish, Engrish users are being mocked for their lack of competence in English, which is often put forth largely as a courtesy to foreign tourists. We might think of this discrepancy as stemming from native English speaker’s more general sense of language privilege.

How does Mock Spanish relate to Gizoogle? Simply put, it’s possible that Gizoogle’s use of Snoop Dogg’s language and especially AAE could represent a type of Mock African-American English, in which the users care little about the accuracy of the form because the humor and entertainment stem from the generally low opinions of the language used by hip-hop artists like Snoop Dogg and other African-Americans. In other words, do we laugh at Gizoogle simply because it sounds like Black people talking and we think Black people are funny sounding? If this is the case, we should see a general disregard for accuracy and an exaggeration of the features that are being used to make it clear that we are mocking AAE or Snoop Dogg. As a more general example, consider that I personally use the phrase “in fact” very frequently. If you wanted to mock my language, you might exaggerate the extent to which I used the phrase including placing it in positions where it makes no sense. So that in fact you would in fact get phrases in fact in fact like this in fact. In fact.

In order to arrive at an informed opinion on whether Gizoogle represents a type of Mock AAE, we need to look more closely at the language it produces.

Analyzing Gizoogle’s linguistic accuracy

I decided to take a look specifically at some articles by Nobel Prize-winning economist Paul Krugman written for the New York Times. As an example, one article I looked at was a recent one titled “The Wonk Gap“, which was translated using Gizoogle as “Da Wonk Gap“. To give you some idea of the linguistic differences between Krugman’s original version and the Gizoogled version, I’ve presented the first paragraph from each here:

On Saturday, Senator John Barrasso of Wyoming delivered the weekly Republican address. He ignored Syria, presumably because his party is deeply conflicted on the issue. (For the record, so am I.) Instead, he demanded repeal of the Affordable Care Act. “The health care law,” he declared, “has proven to be unpopular, unworkable and unaffordable,” and he predicted “sticker shock” in the months ahead.

Does Gizoogle make errors in representing Snoop Dogg’s speech or AAE more generally? (I’ll address this question in this post)

In a broader sense, how representative of Snoop Dogg’s speech or AAE is Gizoogle? (I’ll address this one in the next post)

To what extent can we explain the linguistic features in Gizoogle as merely aspects of spoken English as opposed to written ‘standard’ English? (I will save this for a later post)

Question 1: Does Gizoogle make errors in representing Snoop Dogg’s speech or AAE more generally?

Ostensibly, the website creators are quite concerned with accuracy, claiming that they’ve “spent countless hours adding translations (currently over 4,000) and improving the algorithm to make sure translated text flows better than ever before”. They state that they use samples of Snoop Dogg’s actual speech as the basis of their translation system. In that sense, we might call any of the features, phrases, or words Gizoogle uses accurate, because they approximate things Snoop Dogg has actually said.

Despite these claims, however, there are many ways in which we can observe that Gizoogle fails at rendering Snoop Dogg’s speech and AAE accurately.

The first problem area involves the replacement of all instances of the word “in” with “up in” which produces a number of unlikely uses of the preposition combination:

1. he predicted “sticker shock” up in tha months ahead
2. Barrasso’s remarks was straight-up interesting, although not up in tha way he intended
3. I’ll explain up in a minute

“Up in” in Snoop Dogg’s speech (and in the speech of others) is confined to instances where the preposition is used in a sense that more clearly indicates location. It’s unlikely to be used in the instances above, two of which deal with time (Examples 1 and 3).

Gizoogle also replaces instances of “am” with “be” producing a number of nonsensical sentences like:

Example 4 above is typical of non-AAE speakers attempting to mock AAE by replacing forms of “to be” with the bare “be”. AAE does have a feature in which sentences like the one below (from an article by John Rickford) are grammatical:

In Example 5 the speaker is talking about habitual occurrences that take place every Christmas day. Example 4 does not seem to have any sense of a habitual occurrence, rather it describes a current state of affairs for the speaker.

One other area concerns the notion of homographs (that is words that have the same spelling but have distinctly different meanings). We can see an example of this where Gizoogle translates the word “party” as “thugged-out lil’ jam” in example 4 above, when it actually refers to political party as seen in example 6. It’s highly unlikely that Snoop Dogg would refer to the Republican Party as the “Republican thugged-out lil’ jam”.

6. He ignored Syria, presumably because his party is deeply conflicted on the issue.

This analysis only scratches the surface of the ways in which Gizoogle fails to accurately capture the rule-governed nature of Snoop Dogg’s (and other AAE speaker’s) speech. However, I think it illustrates that Gizoogle deviates quite a bit from a grammatical rendering of Snoop Dogg’s speech or AAE more generally.

The question at this point is how we should interpret these inaccuracies. On the one hand, the type of work that Gizoogle web creators have set out to do is quite complex. Essentially they face many of the same challenges that computational linguists who try to perform machine translation face. If you’ve ever used an automatic translator, you know just how fraught with difficulty this can be. In that sense, we might think of these errors as understandable and even expected. On the other hand, we might also ask ourselves to what extent both the Gizoogle team and users of the website are even aware of such inaccuracies and what this lack of awareness means for the respect being shown to both Snoop Dogg’s language and AAE more generally. Indeed as I pointed out above many of the errors are similar to the ways in which people mocking AAE are likely to go about doing so: by replacing all instances of a particular word with what they see as the replacement AAE form, which underestimates the complexity of AAE and results in inaccurate forms that we might rightly dub Mock AAE.

We can take this question a step further and think about whether the features appear with exaggerated frequency (as in my “in fact” sentence above). I’ll do just that in my next post. If you’d like to be alerted to my updates on this topic, please subscribe to the blog (by clicking the yellow follow button above and to the right).

UPDATE (13 OCT 2013): I’ve posted part 3 of this series here. In it, I examine Snoop Dogg’s linguistic repertoire and the degree to which Gizoogle is fully representative of it.

UPDATE (27 OCT 2013): I’ve posted part 4 of this series. In it, I look at whether Gizoogle exaggerates the ‘uniqueness’ of Snoop Dogg’s speech (or AAE) by comparing the changes in Gizoogle to other varieties of English such as spoken ‘standard’ US English.

UPDATE (11 NOV 2013): I’ve posted part 5 of this series. In it, I look at the public response to Gizoogle and offer some final opinions on Gizoogle and racism.

In fact, Gizzogle faces a problem that’s more complex than most in machine translation: how to introduce tense and aspect distinctions that are marked in AAVE but not necessarily in Standard English, such as inceptiveness, without having a parallel corpus to statistically derive the relationships. I’d say that’s unsolvable with today’s NLP tools.