Tag: English

In which an encounter with a crazy guy on the subway leads to a statistical analysis of French adverbs.

One evening I was riding the metro home when a guy got into the car with some used books to sell. A man sitting across the aisle from me asked to see them. He flipped through one of them, then took a pen out of his jacket pocket and began circling words–in this book that the other guy was trying to sell. Are you going to buy that?, the would-be bookseller asked the guy with the pen. They exchanged words–the bookseller was not happy about having his books marked up. The bookseller said something that Mr. Pen apparently thought was obvious or stupid. Il est fort, lui, he snorted–he’s a sharp one.

The central meaning of fort/forte is “strong,” but it can also be used adverbially. You hear it a lot that way, and I’ve been trying to figure out exactly when you can use it in that way–it’s often the case that there are word combinations that are possible in a language, but that don’t sound right. Rather, there are particular words that are conventionally used in very specific combinations. Violeta Seretan of the University of Geneva gives some examples of English words that are used to describe the magnitude of various nouns. The semantics of each of these is the same, but the words that are typically used are quite different. We talk about big problems, heavy rain… How about injury? (Answer below.) It would certainly be possible to say large problem, but it’s nowhere near as likely, and it sounds odd, as a native speaker. For example, you could say large problem, but it seems odd. I wanted to be able to demonstrate that this corresponds to some actual statistical tendency, not just my intuitions, so I searched the enTenTen corpus, a collection of almost 20 billion words of written English, looking for big problem and large problem. Here are the frequencies:

big problem: occurs 6 times per million words.

large problem: occurs 0.5 times per million words.

Big problem occurs twelve times more often than large problem–the latter is possible, but it’s not really what you would expect to hear from a native speaker. We call these things like big problem “collocations”–combinations of words that occur statistically more often than you would expect by chance.

You can find collocation dictionaries for English, and they’re quite useful for second-language learners. I don’t know of any for French, though, or at least not where to find them in the US, which is where I am at the moment. (I’ve seen similar things in Canada.) I additionally want to know how these adverbial uses of fort should be translated into English, so I need a way to figure this kind of thing out for myself.

First step: find a whole lot of French text in some easily searchable form. I started with the French section of EUROPARL–a collection of documents from the European Parliament, translated to/from a wide variety of languages. The French section of EUROPARL contains about 59 million words–so, a whole lot–and you can access it through the Sketch Engine web site–so, easily searchable. A quick search showed me that fort is quite common in that data set:

Fort shows up 17,130 times in French section of the EUROPARL corpus–257 times per million words. That’s pretty frequent.

Once I know that, I know that there will be enough data to calculate the collocations–recall that this is a statistical thing, so you need plenty of data. The Sketch Engine interface gives me a number of options for how to do the calculations (scroll down to get past the screen shot):

…which I show you just so that you’ll see that there are a lot of approaches to doing this. I just went with the defaults.

The calculations yielded quite a few possibilities. Here are some of them:

If you’re a stickler for data, you might have noticed that the collocations are ordered by the log of the Dice coefficient, which you could think of as a measure of the statistical effect, I guess. I am really looking for the most common collocations involving fort, though, so I’ll reorder by the cooccurrence count, i.e. the raw count of how often the collocations occurred:

Crap–that basically tells me nothing. Why not? Zipf’s Law. Remember that Zipf’s Law tells us not only that most words are pretty rare, but also that some words are really, really common, and in French, that certainly includes de (“of”), et (“and”), une (“a”), and the rest of what we’re seeing here. (Moral of the story: don’t expect the most frequent things in a language to necessarily be the most revealing things in a language.) If I scroll down a bit, though, I see bien on the list. 683 examples of this–a frequency of 10.25 per million words. Bien is often an adjective, which would presumably make fort adverbial in these cases, so we’re on to something now. Let’s check out some of those examples:

So, now I have some cases where it would make sense to use fort, but I want to know how they would correspond to English, too. This requires that I have access to the corresponding English text. No problem–recall that the EUROPARL corpus is multilingual. In particular, it is what is known as a parallel corpus, which means that it contains the same contents in multiple languages, not just similar contents (although that kind of corpus can be useful, too). I searched for the phrase fort bien. Here’s an example of the output:

Ils les connaissent fort bien et un par un. They recognise each and every one of them very well.

I’m feeling good about how to use fort bien now, but I want to know about other ways that fort could be used with an adjective. So, I’ll do another search of the parallel corpus (i.e. the matched French and English texts), but this time I’ll just search for fort, and I’ll specify that I want it to be an adverb. Here are some of the results:

Now I have some general examples of how to use fort:

Nous estimons fort positif que…We see it as a very positive sign that…

I don’t know every adjective with which it would be OK to use fort, but I know one more than I did when I got out of bed this morning, and I’m cool with that–one less time when I’ll have to use très, which is all that they teach us in school.

A colleague had some observations on this:

On top of being used in collocations, it also marks a style / genre which is somewhat formal or elevated (“soutenu”). This might explain why it remains frequent mostly in collocations and is less frequent (or more marked) in freer combinations. This gives the expression a literary turn or a pretense to a higher register. Both in speech and in writing, it is “soutenu.”

Another native speaker had this to say about it:

“Fort” is used as a synonym of “très”, before adjectives or adverbs . You can use it in about any case, it’s just more elegant than “très”, but not really literary .

The Mr. Pen guy on the subway turned out to be pretty crazy, as far as I could tell. At one point he snapped at my adorable cousin, who happened to be visiting, and I told him to cut it out. This was followed by an initially amusing conversation between him and me that at some point degenerated into a loud tirade on his part. I kept telling him that my French wasn’t that good and I couldn’t understand him, but he just kept going and going. Eventually French people around us began telling him to stop being an asshole and words to that effect, so I assume that it wasn’t very nice, but honestly, I couldn’t tell you. At some point a large and very drunk French guy got on the subway car, and started seriously getting in Mr. Pen’s face–it was clear that this was going to turn violent. Mr. Pen was a very diminutive Haitian man, and I wasn’t going to watch him get the shit beaten out of himself no matter how bizarre he was being, so I got involved. The train stopped, Mr. Pen jumped out, and Mr. Drunk Guy launched into an animated discussion with me about American heavy metal, punctuated by snatches of Metallica songs. All in all, an unusual evening on the metro, but not an unpleasant one by any means–just part of life in The Big City, as we say in English.

In the US, politics and judo have some things in common. Here’s some English vocabulary for talking about them.

Ronda Rousey has one of the best ground games in the world. Here she arm-bars Mesha Tate. Go to Google Images to find pictures of what Tate’s arm looked like afterwards. Picture source: http://www.mmamania.com/2012/5/4/2998793/miesha-tate-arm-injury-update-ronda-rousey-strikeforce-ufc-video.France is the #2 judo country in the world, after Japan. The population of France is about 66 million people, and about 550,000 of them do judo. (For comparison: the population of the US is bout 330 million people, and about 20,000 of them do judo.) The first person I met in France was a diminutive, beautiful woman in her 50s or so who I ran into at a judo practice. She’s nowhere near my size, but can arm-bar me every 7 minutes or so, on average. She’s a great example of French judo: she beats me (over and over) not with strength, but with a subtle, contemplative approach to the sport that relies on imagination and on a deep understanding of how to move in three dimensions and apply basic principles of leverage and physics efficiently–and gently. (Sorta like the famous French diplomacy, I guess.) In judo, we would say that she has a great ground game—the ability to fight on the mat, off your feet, where we use not the throws of standing judo, but arm-bars, chokes, and pins.

The phrase ground game has been in the news quite a bit lately. We often hear about what a great ground game Bernie Sanders has, or about how Trump keeps winning state primaries despite not have a good ground game. In the context of politics, your ground game is how good your campaign is at the very local tasks that require actual personal involvement–particularly, getting your supporters to the polls. A good ground game requires two things.

You have to know who your supporters are.

You have to have engaged, committed volunteers everywhere.

Regarding the first: today, this is mostly a matter of data science. Sasha Issenberg’s book The victory lab does a very good job of telling the story of the development of today’s personalized, data-driven politics. Once, politicians and political parties put a lot of effort into trying to convince people to get behind their ideas. Today, it’s generally thought that trying to change people’s minds is expensive and inefficient; on the other hand, getting the people who already support you to actually go to their polling place and vote is relatively inexpensive, and it’s quite effective. In 2008, the Obama campaign was able to develop pretty good guesses about who was going to vote for their candidate (how they did it is really interesting, but somewhat sobering—see the above-mentioned book), and they focussed their get-out-the-vote effort on those people.

Regarding the second: this is the essence of the ground game. Cruz’s win in the Iowa primaries this nominating cycle was widely attributed to his strong ground game. One of the many, many mysteries of the Republican race for the nomination has been that Trump has done quite well despite not having much of a ground game anywhere.

Many languages have a phenomenon such that nouns belong to groups that affect things about the words with which they occur. French is such a language. You can more or less put French nouns into two groups, as follows:

For one group, the singular definite article (“the”) is le, the singular indefinite article (“a”) is un, the adjective “big” is grand, and the adjective “boring” is ennuyeux.

For the other group, the singular definite article (“the”) is la, the singular indefinite article (“a”) is une, the adjective “big” is grande, and the adjective “boring” is ennuyeuse.

When a language has two or three of these classes, the language is typically said to have a gender system. So, French has two of these classes, and we call the nouns in these classes masculine and feminine nouns. German has three of these classes, and we call them masculine, feminine, and neuter nouns. Lithuanian Yiddish has three of these classes, but most other dialects of Yiddish have two. English has basically no such classes–we have words that are sort of intrinsically masculine, like father, and words that are sort of intrinsically feminine, like mother, but since they don’t affect the forms of the words with which they appear (you say the mother and the father, with no differences in the word the), linguists wouldn’t call it a gender system. On the other hand, Old English (spoken from around 450 to around 1400) had three noun classes. (Look at the different forms of the word the in these three Old English nouns, taken from Wikipedia: sēo sunne (“the sun”), se mōna (“the moon”), þæt wīf (“the woman/wife”).) A language on which I did research in graduate school only has two such classes, but referring to anything by the wrong class is a way to insult it. It doesn’t matter which of the two classes it belongs to–if you use the wrong modifiers, it’s an insult. I was terrified to ever open my mouth, and don’t speak it at all. (My son often played in the corner of the office while I collected data. It’s quite amazing to hear dô páráná come–correctly–out of the mouth of that blond-haired, blue-eyed, video game addict today.)

There’s nothing magic about the numbers two and three–languages can have more or less arbitrary numbers of these classes. We tend to refer to them as genders when there are just two or three, and to refer to them as noun classes when there are more than that, but there is no difference between what we call the gender system in French, with two noun classes, and what we call the noun class system in Shona, which has twenty noun classes. It’s a difference of numbers, not of kind–in both cases, you have this more-or-less arbitrary slicing up of the nominal lexicon (noun vocabulary) of the language into groups of nouns that affect the forms of articles, adjectives, etc. in various and sundry ways.

I say “various and sundry” because gender/noun class systems can work out in lots of different ways. In Semitic languages, verbs agree with the gender of their subjects. For example, he studied is lamad, while she studied is lamda. In the first case, it’s the pattern of having the two a-a vowels that makes it the masculine form of the verb, and in the second case, it’s the a in the middle, the md coming together (versus mad in the masculine form), and the a at the end that make it feminine. Different verbs, tenses, and numbers (that is, singular versus plural) have different forms, so don’t get excited about the fact that there’s an a at the end of the third person singular past tense feminine form of the verb–it’s not that way all the time. For example, he goes is holekh, while she goes is holekhet.

Does having classes of nouns in your language–or not having them–make a culture more or less sexist? I only have anecdotes here, and–counter to what you might hear–anecdote is not the singular of data. For what it’s worth: my undergraduate advisor always used to point out that Hebrew is about as gendered of a language as you can get (see above–even verbs have to have gender in Hebrew), and probably close to everyone in Israel speaks either Hebrew or Arabic (which has the identical system), but Israel was the fourth country in the world to elect a woman as the head of state. In contrast, Finnish has no gender whatsoever, but has never had a female head of state, as far as I know. (This is not to imply anything bad about Finland–there are a bazillion countries with genderless languages that have never had a female head of state. I don’t know why my professor picked on the Finns.)

English note concerning the title of this post: using the word got (or gots) as the present tense of the verb to have is a social marker of class–that’s “class” in the sense of couche sociale. Lower class, specifically. Other speakers might use it for humorous effect. “To have class” means something like to have elegance of style or manners. So, if you say you got no class, man, part of the flavor of the expression comes from the fact that you’re using a “low (social) class” verb form to talk about “class” in the sense of elegance.

English has a number of words that are made of numbers. Here are some of them.

No French in this post–this is all about obscure English vocabulary that you can bet Zipf’s Law will bring into your life sooner or later.

I recently wrote a post about what we call in the US 3x5s (pronounced “three by fives”), and that got me thinking about words in English that are formed in similar ways. There are a number of them, and if you can use them, it will definitely add an American flavor to your English.

2×4 (pronounced “two by four”): a kind of wooden board that measures about 2 inches by four inches, and about six feet in length. In America, 2x4s are commonly used in the construction of homes and the like.

4×4 (pronounced “four by four”): a kind of truck or similar vehicle that can provide power to all four wheels simultaneously. (More traditionally, cars would only power the front axle or the back axle.)

24/7 (pronounced “twenty-four seven”): absolutely constantly. 24 hours in a day, and seven days in a week, so 24/7 is all the time.

7-11 (pronounced “seven eleven”): originally the name of a convenience store that was once open from 7 AM to 11 PM. Today you can use it to refer to pretty much any 24-hour convenience store, I think.

69 (pronounced “sixty-nine”): a verb referring to a specific sexual act. Consider the relationship between the numbers 6 and 9 and you can probably figure it out for yourself, which will save me from feeling like I have to put a trigger warning on a blog post about numbers.

soixante-neuf: same thing. Yes, we can use it in English, and if you’re sleeping with people who are educated enough to know what it means, then you probably already know what it means yourself.

10-4 (pronounced “ten four”): I heard you, I got your message; there’s also some implication that you agree. Often heard in the contexts 10-4, good buddy, or that’s a big 10-4, or, if you wear a cap with the name of a feed company on the front, or just listened to a lot of AM radio in the 1970s, that’s a big 10-4, good buddy. That’s how we said it when I was a little tyke, at any rate. OK: a teenager.

.45 (pronounced “forty-five”): a kind of pistol, known for its “stopping power”–that is, if someone is charging you, a shot from one of these things will keep them from moving forward if it hits them. The projectiles are short, but very big around, and heavy. It’s not very accurate at a long distance, but at a short distance, it’s very effective at what it’s intended for.

.32 (pronounced “thirty-two”): another kind of pistol. They’re also not very accurate, as they usually have a pretty short barrel, and they really don’t have any use other than killing people at close range, as far as I know.

.36 (pronounced “thirty-six”): another kind of pistol. They sometimes have longer barrels, in which case you can use them to kill people at close range, and also a bit further away.

.22 (pronounced “twenty-two”): another kind of pistol, and also a small-caliber rifle. The bullet is quite small, and unless you get shot in some place really vital–head, heart, an artery–it may not do that much damage. On the other hand, I did once see a young guy who shot himself in the head with one of these–he didn’t die immediately, but he sure as hell died eventually. Again, there is not much that’s actually useful about these things…

You can use the number-by-number construction productively (in the linguistic sense of the word productive, which means that the construction can be used to produce new things) to talk about the sizes of pieces of wood in general. However, if you’re not talking about 2x4s specifically, the context needs to be clear if you want to be understood. The assumption is that the pieces of wood in question will be 6 feet long unless otherwise specified, so if you ask for a 3×6 (pronounced “three by six”) in a lumber yard, people will know what size board to give you. For a version of this kind of construction that is also productive, although quite obscure, see this entry from the Urban Dictionary for a description of how it is used to refer to pairs of male characters in the Gundam Wing anime series.

Having gotten the basics out of the way, here’s a useful expression: to get/be hit with a 2×4. You know what a 2×4 is by now–a solid wooden board. If someone smacks you upside the head with it, you will have been smacked really, really hard. To get/be hit with a 2×4 means to be stunned by something that you’ve learnt.

Here are some real-life examples. This woman wrote on her blog about needing to be forced to face the facts that she was (a) eating too much, and (b) not exercising enough:

Here’s a story from the Washington Post (a reputable and very famous American newspaper known for its coverage of national politics) about Chris Christie, describing the experience of the explosion of the Bridgegate scandal during the period before his unsuccessful run for the Republican presidential nomination:

So, now you see why it would, in fact, probably be better to get shot in the leg with one of those tiny little .22s than to get hit in the head with a 2×4. Native speakers of English (or non-native speakers who just like to collect funny words), do you have any other all-number words to add to the list?