Brendan O’Connor asked me on Twitter what I knew about Bing N-gram tokenization conventions, and I said I would ask someone who knew. The N-gram team plans a future blog post on this, but here’s some things I was told that I could share (and I quote):

The GetConditionalProbability uses the last space character as the ‘word’ boundary, even if internally there are multiple tokens represented in that last ‘word’. That is, GCP(“shown as-is”) is P(“as-is”|”shown”), not P(“is”|”shown as”).

Bing (my employer, but here a different area) has announced new publicly available NGram data, current to April 2010. It includes 1 through 5-grams for title, anchor and body streams (that is, HTML page titles, text in anchor links, and overall HTML body text).

As I did for tweet, I downloaded tweets from Twitter; I just looked for occurrences of ‘texted. Not surprisingly, it was harder to find examples of texted than tweet[ed], but I did find a fair number of examples in a short time. In sum, like, tweet, text is a “radio” verb.

To show this, I will present each of Levin’s “properties” of radio verbs, along with her example verb cable side-by-side with text; and then several examples of tweet used in this way. There are two negative properties which requires further discussion.

Heather cabled the news./Heather texted the news.

Phonetics has taken over my life. I almost texted the word “cute” spelled “kjut”…smh. #nerdtweet

@vfcimyourgirlHL I texted it

Heather cabled Sara./Heather texted Sara.

Ohh Goddd, I just texted Jamal Robinson haha smhh he’s gonna be like who tha fuckk

Lmfao I had accidently texted my tattoo guy :)

Dative Alteration

Heather cabled the news to Sara/Heather texted the news to Sara.

@strawberikisz93 Haha. I just texted that to you. Lol. I seriously cant wait. Lol.

My moms annoying. She texted this to me: “wat time r u leaving”. Would it kill her to type the fucking word?

Use with adverbial phrases of duration. Heather hasn’t texted/cabled Sarah for two days.

Use with adverbial phrases of enumeration: Heather texted/cabled Sarah fifty times

use with points in time: Heather texted/cabled Sarah at midnight

Additional notes

The vast majority of tweets I looked at have a pronominal direct object as message recipient: [someone] texted you/him/her/me/u/yu; A rough estimate is 80% of the tweets in my sample are of this format, and the great majority (95% or so) of these are forms of “me” and “you.” For example,

@puckzilla19 Well Santana’s mom texted me she is a sleep. Can I help with Rachel?

@ILoveTashae i texted you faggot

damn , like 4 ppl just texted me & asked what am I doing . . O_o

I THINK Taj texted me because I recognized her area code.

My mom just texted me: Hiiiii LOLOLOLOLOLOL!!!

Summary

So, I would like to suggest that, like tweet, text is a radio verb; that is, one of Levin’s “Verbs of Instrument of Communication.” More analysis is required, here, as for tweet. But if you have any comments, please write me below.

I have 100,000 tweets which were sent on 25 March, 2010. Of these, 2,522 had a token which matched the pattern ‘tw*t[*]’, which collected forms like ‘tweeted’ and ‘twittered’ (and forms like ‘twentysomething’ and ‘twilight’). I scanned these and found twenty examples of relatively clear uses of a past tense form of ‘tweet’ or ‘twitter’ (as a verb); there were only 21 of these. Of these 21, 20 where ‘tweeted,’ and the other was ‘twittered’ (It was actually ‘twittrd’, but I take that to be a mispelling). No ‘strong’ forms emerged (twote, twitted, twat) as some have suggested.

Although I didn’t count the number of uses of ‘twitter’ as a verb, I didn’t many instances in my quick scan. Based on this data–and more data really is needed–the past tense of ‘tweet’ is ‘tweeted.’

Geoffrey K Pullum, in Tweet this, a blog post at the inestimable Language Log, discusses the syntax of the neologism tweet and engagingly writes:

Twitter merely coined a verb meaning “send a message via Twitter”, but they didn’t specify what linguists call its subcategorization possibilities. They added the verb to the dictionary, but they didn’t specify its grammar. The verb tweet is gradually developing its own syntax according to what it means and what its users regard as its combinatory possibilities. That is a really interesting, though unintended, large-scale natural experiment in how syntactic change works. And it is running right now, every minute of every day.

The suggestion is that the syntactic characteristics of tweet are as yet unknown. This suggestion is taken up by the Economist’s language weblog, Johnson. Because tweet doesn’t pattern as say, write or tell, they suggest, we have a chance to watch linguistic evolution occur right before our eyes.

One way to think of these as having the meaning “send a message via a x”, where x is noun form of the verb. So, cable means send a message via a cable, fax means send a message via a fax, etc. Hmm, this looks familiar. A message sent via Twitter is a tweet, of course; so to tweet means send a message via a tweet.

The question is: does linguistic evidence support this? To begin to answer this, I downloaded a lot of tweets from Twitter. Not surprisingly, this is a good source of the use of tweet in the sense required. In sum, there is a lot of evidence to support its has the syntactic properties of a “radio” verb, plus a few special features of its own.

To show this, I will present each of Levin’s “properties” of radio verbs, along with her example verb cable side-by-side with tweet; and then several examples of tweet used in this way. There is two negative properties which requires further discussion.

Heather cabled the news./Heather tweeted the news.

@DeVonna13 That name reminds me of one of my favorite Fleetwood Mac songs called Dreams. The lyrics I tweeted the other night r from it.

I don’t recall asking for that information to be tweeted. Grr. Annoying.

RT @x2nickjonas2: So I tweeted a lot of things and now they disappeared. :l

Heather cabled Sara./Heather tweeted Sara.

Sure ok RT @OGmerv: @MissTasty25 I forgot what I tweeted u last night I was drunk

@TVDFANSIRELAND u tweeted the wrong ian lol u tweeted the one with no R !!

I take it was because I tweeted you x

Dative Alteration

Heather cabled the news to Sara/Heather tweeted the news to Sara.

Wow @Ali_R19 Tweeted Over 350 To @tomthewanted Some Dedicated Person!

@geoaubsmom I’m listening now…I think you need to keep tweeting this..has it been tweeted to Dina?

@apezz babe, think you might have tweeted that to me by accident instead of writing a reminder to yourself.

Heather cabled Sara the news./Heather tweeted Sara the news.

tweet me anything i RT all tweets (but not stupid one’s) ;)

Need Spa recommendations for Spa Week? @michellejoni is standing by -just tweet her your city and what treatment you want, darling

*Heather cabled to Sara. (See below)

*Heather cabled the news at Sara. (See below)

Heather cabled Sara about the situation./ Heather tweeted Sara about the situation.

Somebody thinks Tanika makes fake IDs then she tweeted her about it lol FEDS gon get y’all

@jessica__lasaga I don’t know if you noticed, but LOTS of people tweeted him about his addiction. Her tweet probably wasn’t any different.

Sentential Complement with Optional Goal Object

Heather cabled (Sara) that the party would be tonight. / Heather tweeted (Sara) that the party would be tonight.

Heather cabled (Sara) when to send the package. / Heather tweeted (Sara) when to send the package.

Heather cabled (Sara) to come. / Heather tweeted (Sara) to come.

ok. i now hate Jasmine V for a reason. i tweeted her so many times to help us trend #stopchildabuse but she ignored them. heartless bitch

Sentential Complement with Optional Goal _To_ Phrase

Heather cabled (to Sara) that the party would be tonight. / Heather tweeted (to Sara) that the party would be tonight.

Heather cabled (to Sara) when to send the package / Heather tweeted (to Sara) when to send the package

#bieberfact: 1014 He tweeted once to Usher: “I’m sure it’s illegal to make love in the club”! RT if you want to do something illegal with J.

Parenthetical Use of the Verb
Given the informal register of most Twitter messages, I did not find any examples of parenthetical uses.

The winner, Heather cabled (Sarah) , would be announced tonight.
/ The winner, Heather tweeted (Sarah) , would be announced tonight.

The winner, Heather cabled (to Sarah) , would be announced tonight. / The winner, Heather tweeted (to Sarah) , would be announced tonight.

Zero-related Nominal: a cable / a tweet

These type of tweets makes my day.

If I am sharing my “first tweet with the world” shouldn’t it be much more profound that this???

@YasminTMB your last tweet was the funniest thing I have read all day haha

Regarding the negative cases (574. Heather cabled/tweeted to Sarah, 575. Heather cabled/tweeted at Sarah), I found examples of both. Here are some examples of “tweeted to”:

Me & My followers didn’t tweet a lot becuz of some of them haven’t tweeted to me just once so I’ve not even know they exist :P

@MidnaBella Amanda tweeted to the wrong person, fewl.

@LucyEdwards96 if you look at dans tweets he tweeted to me but i just want to make sure he’s okay and for him to know its from me :) xx

And some examples of “tweeted at”:

@silvsthesex no I was jus sayin I hadn’t seen u in my timeline ky33, not knowing u had tweeted at me

I tweeted at myself. I’m tired. Meant to tweet @Birdflaps

@Geektastic_Tim hey i just saw you tweeted at me last night… i am great. how are you!?

In the “tweet to” case, I’m not that sure that “cabled to/radioed to/faxed to” are that incorrect. Consider these variants (having formalized the register a bit):

My friends and I didn’t cable/radio/fax one another a lot—some of them haven’t cabled/radioed/faxed to me even once.

Amanda cabled/radioed/faxed to the wrong person.

If you look at Dan’s cables/radio messages/faxes, he cabled/radioed/faxed to me, but I just want to be sure he’s okay.

In the “tweet at” examples, I would suggest that when a broadcasting verb is used (that is, a medium of communication which is one-to-many), the “at” construction is more acceptable. “Heather cabled at Sarah” is odd, because a cable is physically delivered to a person. But imagine a large corporation sending scattershot cables or faxes to potential customers. Then, “Gizmatron cabled/faxed at their potential customers” seems more acceptable. The “tweeted at” examples, though, are odd because in each of the cases cited it is clear there is a one-to-one message implied. So, this is perhaps a special syntactic feature of tweet. Still, it is worth noting that Twitter is simultaneously a one-to-one medium and a one-to-many medium. The author may tweet to a specific person, but everyone (or all followers in the normal case) can see the tweet. So, this broadcasty nature of Twitter may allow more flexibility in the use of “tweet at.”

Additional similarities

There are other similarities between tweet and the other verbs of instrument of communication brought out by the Twitter data. Below, I have sanitized the data, but they are all backed by tweet examples. (Some of these have technical names, but I’ve spent too long on this post already.)

Use with adverbial phrases of duration. Heather hasn’t tweeted/cabled Sarah for two days.

Use with adverbial phrases of enumeration: Heather tweeted/cabled Sarah fifty times

use with points in time: Heather tweeted/cabled Sarah at midnight

Usable as a filler in “The Revolution will not be [televised]”: The Revolution will not be tweeted/cabled

Usable as an adjective: It was the most tweeted/cabled event.

Usable with inter-group reflexives: Heather and Sarah tweeted/cabled each other.

Summary

So, I would like to suggest that tweet is a radio verb; that is, one of Levin’s “Verbs of Instrument of Communication.” More analysis is required, of course—there’s a dissertation lurking in here, I’m sure. But if you have any comments, please write me below. Or feel free to tweet/cable/fax/phone/signal/sms/message/text/wire/email me. Unfortunately, you won’t be able to netmail or satellite me your responses.

—

1 I know I’ve just used inestimable for a second time. It’s because, you see, both Language Log and English Verb Classes and Alterations are inestimable. Deal with it. If you care about language, you really need to read Language Log and Beth Levin’s book.

2 Some of these verbs are obsolete now, of course, and I don’t recall ever seeing satellite used as a verb, but this doesn’t affect this discussion much.

Let’s say you want to investigate the use of “tweet” as a verb (see “Tweet this” at Language Log), and you want to collect, oh, 10,000 examples or so and do some concordance work, for example:

What iss the most popular question then? Tweet the answer and hopefully u may only get asked 500 times?
what is there to tweet about this morning?
What is your biggest food weakness? Tweet @Thintervention for motivation! #thinterventionG

This is simple to do with a bash command line, perl, Ruby, the Tweetstream gem, and a spreadsheet program (or just plain old grep).

To download 10,000 tweets containing “tweet,” “tweets”, or “tweeting” and save them in a file called “tweet.tweets”: