Search

Internet anonymity just got tougher with the creation of a new algorithm. Researchers have developed a way to predict gender from the text in a comment or social network post.

Researchers with the Mitre corporation have developed a method to accurately guess gender by isolating specific words in a tweet. Twitter does not collect gender within profiles making this a perfect testing ground for the algorithm. The team first collected the location, description, profile name and real name of all Twitter users in the sample. Most of the Twitter users in the sample size has only posted one time on the social service. An opening test was to see if the algorithm could detect a person’s gender from the name and the computer was able to guess correctly 89 percent of the time.

By analyzing only the content of a single tweet, the algorithm was able to guess gender correctly nearly 66 percent of the time. Analyzing all the tweets in a user’s stream increased accuracy to a bit over 75 percent. Other results included about 71 percent accuracy on just the description and 77 percent accuracy on the screen name. When combining all four fields with the tweets, the computer had a 92 percent accuracy rating.

Punctuation often popped up as an indication of gender. Usage of a smiley face or an exclamation point typically indicated that the gender is female. Females are also more likely to use words like “love”, “cute”, “happy”, “mommy”, “sleep”, “school”, “baby”, “bed”, “chocolate” and “hate” as well as Internet slang like “LOL” and “OMG”. Males only had a couple phrases attributed to them including “http” and “google”.

The study also showed clear gender lines for “possessive bigrams”, a phrase that starts with “my” or “our”. Phrases attributed to males included “my wife”, “my gf” and “my beer”. Females most commonly used “my yogurt” and “my husband”. These phrases were also analyzed to identify political identification. Tweets about yoga, vegetarians and the Los Angeles Lakers are most likely to come from Democrats while tweets about Walmart, weapons and LSU are most likely to come from Republicans.

This algorithm would be useful to anyone attempting to reach a specific audience on Twitter, namely brands and businesses attempting to market themselves to the Twitter audience.