Tag: news

In this article about our collective terrible password habits, I discuss some reasons why we constantly use ‘password’ and ‘123456’ even though we know it’s a terrible idea, as well as some fixes that work with human memory and mathematical complexity, rather than against.

Earlier I showed how to extract the postings from a given Facebook page. Here, I will show you how to do some basic text mining on the posts you found. For practice, I will use the messages of a local neo-Confederate group called ACTBAC (“Alamance County Taking Back Alamance County”). Their antics have been covered in local media, but with their re-branding in light of the Trump election and the rise of the alt-right, many people in our area are still wondering just what this group is all about. Perhaps text mining can help illuminate some of their beliefs and strategies for us.

Overview

I ran the script on their “ALAMANCEOURS” Facebook page, and it yielded 1017 messages beginning in June, 2015. Here is the spreadsheet (actbac.csv) in case anyone wants to play around with it.

Top 50 words most used in their FB posts

I wrote a program to count frequencies and remove stopwords (stopwords are boring words like ‘a’, ‘to’, ‘it’, ‘is’). Then I highlighted the most interesting words (to me) in yellow. Each word is shown with its count next to it.

From these, we can see many predictable words for a county-based neo-Confederate group (county, state, southern, cause, carolina). However, I was most intrigued by the prominence of the word ‘stand’.

Usage of the word ‘stand’

Stand can be both a noun (“take a stand”) and a verb (“stand up for yourself”). With this group, ‘stand’ is the most common verb used in their messages (not counting stopwords like ‘be’ or ‘is’). My hypothesis is that, as a verb, this word ‘stand’ conveys a lot of the power of their movement. Why?

To help understand how they use ‘stand’, I wrote a program to generate a concordance to show how the word is used in their messages. The first few lines of the concordance look like this:

The word of interest (shown in red) is placed in the center of each line. The concordance then shows each collection of words around that word.

From this, I learned that the word ‘stand’ is used 291 times in 1017 messages, most commonly as follows:

In addition, there are another 41 uses of “stood” and 86 uses of “standing”.

It would be interesting to compare this usage to other Confederate and non-Confederate groups to see whether this is a uniquely ACTBAC thing (I doubt it), or – more likely – it is a rhetorical device used more broadly by all Confederate groups. I would guess that their defensive “stand up for your beliefs, no matter how unpopular” plea has great power in a neo-Confederate setting. After all, the “Lost Cause” narrative also describes a heroic, virtuous South fighting against all odds, and ultimately unfairly defeated in the American Civil War.

Topic modeling

Next, just for fun, I wrote a program to build a topic model of the postings. A topic model tells us what words frequently co-occur in sentences, and tries to make groupings of those words into possible “topics”. Inside the program, you can fiddle with the number of topics, and the number of words generated for each.

After running a few experiments, I settled on 3 topics with 4 words each. These topics weren’t terribly interesting, as you can see below, but we can still learn a few interesting things. First, when ‘stand’ is mentioned, it is often used with ‘southern’ and ‘state’, and it seems to be ‘people’ who are doing the standing (makes sense). Additionally, the topic we could call ‘Confederate battle flag’ emerges (labeled Topic 3 below):

Text Difficulty

The FKRE is the Flesch-Kincaid Reading Ease metric, which tells you how “easy” a document is to read, and then this number (71.55, or “fairly easy”) can be converted to a grade level metric (7th grade). I also ran an overall readability summary, which integrates several other difficulty measures in addition to FKRE. That one also puts this text at right around 6th or 7th grade.

I hope you enjoyed this quick tour of text mining – perhaps you will find some interesting techniques to use on your own projects!

I was playing around with some code today from Mastering Social Media Mining with Python (by Marco Bonzanini, and published by the same company that published my last twobooks), and I came up with this snazzy set of scripts (postGetter.py, fileParser.py) that mines the last X posts from any public Facebook page, creates a clickable FB url for each, sorts them in order of most interactions (shares + likes), and creates a spreadsheet with the results.

Here are the results when run for the last 1000 posts by the Times-News of Burlington, our local newspaper: timesNews.csv.

Findings?

Not that surprising or shocking, but here goes. The last 1000 only goes back to August or so (modify the params at the top of the code to make it scrape more), but the top five posts for August-December based on interactions seem to be:

The other day on This American Life, Dallas Woodhouse, executive director of the North Carolina GOP, dismissed evidence that vote fraud in the state was basically non-existent:

Don’t show me studies. Academics, I mean, a bunch of knuckleheads, pointy-headed professors. We deal in the real world.

Since I’ve done prior academic work on insults, I was very intrigued at the possibility of my being simultaneously pointy-headed and knuckle-headed, living in an unreality where I and my cranially-challenged colleagues churn out reams of useless studies in order retain Total World Domination.

As it turns out, the origin of knucklehead was a U.S. Army PR/recruitment program’s Goofus-type character (like the old Highlights magazine “Goofus and Gallant”) named R.F. Knucklehead. He was never portrayed as smart, and was always making bad decisions. Here is a cartoon showing Aviation Cadet Knucklehead working hard at signing a simple signature:

He sneered from the campaign podium at the “long-haired men and short-skirted women” of the 1960s and derided “pointy-head college professors who can’t even park a bicycle straight.”

I wonder what happened in the bicycle parking lot between Wallace and some unlucky academic. We’ll never know. But The New York Times brings back the “pointy-head” quote for our times, comparing Wallace’s use of anti-intellectual populist insults to Trump’s.

Interestingly, Google users seem to think differently about professors (for those wondering if these results were influenced by my login, they weren’t: this was an incognito browser window).

Anyway, I hope this little etymological excursion shows what a professor does when she hears something idiotic: we brush aside the insult and instead we ask lots of questions, look up the answers, synthesize the results into a conclusion, maybe ask additional questions, cite our sources, then teach what we learned to others.