"data mining" entries

Surprising social media stats

I’ve been filtering Twitter’s firehose for tweets about “#Syria” for about the past week in order to accumulate a sizable volume of data about an important current event. As of Friday, I noticed that the tally has surpassed one million tweets, so it seemed to be a good time to apply some techniques from Mining the Social Web and explore the data.

While some of the findings from a preliminary analysis confirm common intuition, others are a bit surprising. The remainder of this post explores the tweets with a cursory analysis addressing the “Who?, What?, Where?, and When?” of what’s in the data.

Technology has changed the way we understand targeting and contextual relevance. How will marketing adapt?

Over the past five years, marketing has transformed from a primarily creative process into an increasingly data-driven discipline with strong technological underpinnings.

The central purpose of marketing hasn’t changed: brands still aim to tell a story, to emotionally connect with a prospective customer, with the goal of selling a product or service. But while the need to tell an interesting, authentic story has remained constant, customers and channels have fundamentally changed. Old Marketing took a spray-and-pray approach aimed at a broad, passive audience: agencies created demographic or psychographic profiles for theoretical consumers and broadcast ads on mass-consumption channels, such as television, print, and radio. “Targeting” was primarily about identifying high concentrations of a given consumer type in a geographic area.

The era of demographics is over. Advances in data mining have enabled marketers to develop highly specific profiles of customers at the individual level, using data drawn from actual personal behavior and consumption patterns. Now when a brand tells a story, it has the ability to tailor the narrative in such a way that each potential customer finds it relevant, personally. Users have become accustomed to this kind of sophisticated targeting; broad-spectrum advertising on the Internet is now essentially spam. At the same time, there is still a fine line between “well-targeted” and “creepy.” Read more…

Notes and links from the data journalism beat

It seems that new data journalism tools are being released every day. The latest data journalism tools include: CivOmega, a modular prototype for government data that allows developers to plug in their own APIs and Fact Tank, a new data journalism platform from the Pew Research Center. Also, for journalists in the US concerned about protecting their own personal data, government investigators now face more hurdles when seeking a reporter’s records. And for a little data journalism levity, check out the latest project from Noah Veltman, a data journalism fellow at the BBC. Veltman used the GovTrack Bulk data API, SQL and Python to conduct a self-described “overly in-depth analysis” of Congressional Acronym Abuse from 1973 to the present.

Introducing Fact Tank: An Interview with Pew Research Center President Alan Murray (Data Driven Journalism)
Obviously, we collect vast amounts of data, about demographics, about a variety of issues – we are basically a data shop. In the past, most of the dissemination of our data has been done through existing media. But we also felt it was important for us to get our own data relating to news events out to the public more quickly and more directly. Additionally, we also felt it was important for us to play a role in aggregating data sets which we can then present ourselves.”

Response to NSA data mining and the troubling lack of technical details, Facebook's Open Compute data center, and local police are growing their own DNA databases.

It’s a question of power, not privacy — and what is the NSA really doing?

Pew Research Center national survey

In the wake of the leaked NSA data-collection programs, the Pew Research Center conducted a national survey to measure American’s response. The survey found that 56% of respondents think NSA’s telephone record tracking program is an acceptable method to investigate terrorism, and 62% said the government’s investigations into possible terrorist threats are more important than personal privacy.

Rebecca J. Rosen at The Atlantic took a look at legal scholar Daniel J. Solove’s argument that we should care about the government’s collection of our data, but not for the reasons one might think — the collection itself, he argues, isn’t as troubling as the fact that they’re holding the data in perpetuity and that we don’t have access to it. Rosen quotes Solove:

“The NSA program involves a massive database of information that individuals cannot access. … This kind of information processing, which forbids people’s knowledge or involvement, resembles in some ways a kind of due process problem. It is a structural problem involving the way people are treated by government institutions. Moreover, it creates a power imbalance between individuals and the government. … This issue is not about whether the information gathered is something people want to hide, but rather about the power and the structure of government.”

Inaugural 2013 app has plans for your data, the "unprecedented" security issues of the Internet of Things, and optical switches speed up data centers.

Here are a few stories from the data space that caught my attention this week.

Inaugural 2013 app takes as much as it gives

The Presidential Inaugural Committee (PIC) launched the first official inaugural smartphone app, Inaugural 2013 (for iOS and for Android), Monday. Daniel Strauss reports in a post at The Hill that inauguration attendees can use the app to locate and RSVP to events, watch events via livestream, and navigate the event with an interactive map.

What isn’t front and center in the pomp and circumstance of the shiny new app are the terms of service and the privacy statement. Steve Friess at Politico points out that in the fine print, users are giving the PIC permission to share their data — phone numbers, email, home addresses, and GPS location data, for instance — “with candidates, organizations, groups or causes that [the PIC] believe have similar political viewpoints, principles or objectives.”

Gregory Ferenstein reports at TechCrunch that “privacy advocates find it troubling that the fine-print on the PIC’s website says it can use activity data ‘without limitation in advertising, fundraising and other communications in support of PIC and the principles of the Democratic party, without any right of compensation or attribution.'”

Nunberg says that though “it didn’t get the wide public exposure given to items like ‘frankenstorm,’ ‘fiscal cliff‘ and YOLO,” and might not have been “as familiar to many people as ‘Etch A Sketch’ and ’47 percent'” were during the election, big data has become a phenomenon affecting our lives: “It’s responsible for a lot of our anxieties about intrusions on our privacy, whether from the government’s anti-terrorist data sweeps or the ads that track us as we wander around the Web.” He also notes that big data has transformed statistics into “a sexy major” and predicts the term will long outlast “Gangnam Style.” (You can read Nunberg’s full case for big data at NPR.)

The Benefits of Poetry for Professionals (HBR) — Harman Industries founder Sidney Harman once told The New York Times, “I used to tell my senior staff to get me poets as managers. Poets are our original systems thinkers. They look at our most complex environments and they reduce the complexity to something they begin to understand.”

3D Printing Popup Store Opens in NYC (Makezine Blog) — MAKE has partnered with 3DEA, a pop up 3D printing emporium in New York City’s fashion district. The store will sell printers and 3D printed objects as well as offer a lineup of classes, workshops, and presentations from the likes of jewelry maker Kevin Wei, 3D printing artist Josh Harker, and Shapeways’ Duann Scott. This. is. awesome!

Here are a few stories from the data space that caught my attention this week.

How big data is transforming just about everything

Professor John Naughton took a look this week at how big data is transforming various industries that affect our daily lives.

He highlights finance, of course, which he says has been “pathologically mathematised;” marketing, for which there is more data about human behavior than we’ve ever had; and the very broad category of science. Naughton notes that researchers used to conjure up theories and look to data to support or refute; now, researchers turn to data to find patterns and connections that might inspire new theories. Naughton also looks at medicine, which is just on the brink of delving into the big data realm. He writes:

Naughton addresses the use of big data in sports as well, speculating that baseball has been the sport most transformed by data. He’ll likely find agreement there. Barry Eggers goes into depth on the dramatic effect big data is having on baseball over at TechCrunch. He notes that simple data analysis of statistics, which baseball has embraced since its beginnings, has evolved into gathering mountains of unstructured data and employing Hadoop to gain new and better insights from data that isn’t part of the structured game information. Eggers writes:

“By having his data scientist run a Hadoop job before every game, [San Francisco Giants manager] Bruce Bochy can not only make an informed decision about where to locate a 3-1 Matt Cain pitch to Prince Fielder, but he can also predict how and where the ball might be hit, how much ground his infielders and outfielders can cover on such a hit, and thus determine where to shift his defense. Taken one step further, it’s not hard to imagine a day where managers like Bochy have their locker room data scientist run real-time, in-game analytics using technologies like Cassandra, Hbase, Drill, and Impala.”

Here are a few stories from the data space that caught my attention this week.

Presidential candidates are mining your data

Data is playing an unprecedented role in the US presidential election this year. The two presidential campaigns have access to personal voter data “at a scale never before imagined,” reports Charles Duhigg at the New York Times. The candidate camps are using personal data in polling calls, accessing such details as “whether voters may have visited pornography Web sites, have homes in foreclosure, are more prone to drink Michelob Ultra than Corona or have gay friends or enjoy expensive vacations,” Duhigg writes. He reports that both campaigns emphasized they were committed to protecting voter privacy, but notes:

“Officials for both campaigns acknowledge that many of their consultants and vendors draw data from an array of sources — including some the campaigns themselves have not fully scrutinized.”

A Romney campaign official told Duhigg: “You don’t want your analytical efforts to be obvious because voters get creeped out. A lot of what we’re doing is behind the scenes.”

The “behind the scenes” may be enough in itself to creep people out. These sorts of situations are starting to tarnish the image of the consumer data-mining industry, and a Manhattan trade group, the Direct Marketing Association, is launching a public relations campaign — the “Data-Driven Marketing Institute” — to smooth things over before government regulators get involved. Natasha Singer reports at the New York Times:

“According to a statement, the trade group intends to promote such targeted marketing to lawmakers and the public ‘with the goal of preventing needless regulation or enforcement that could severely hamper consumer marketing and stifle innovation’ as well as ‘tamping down unfavorable media attention.’ As part of the campaign, the group plans to finance academic research into the industry’s economic impact, said Linda A. Woolley, the acting chief executive of the Direct Marketing Association.”

One of the biggest issues, Singer notes, is that people want control over their data. Chuck Teller, founder of Catalog Choice, told Singer that in a recent survey conducted by his company, 67% of people responded that they wanted to see the data collected about them by data brokers and 78% said they wanted the ability to opt out of the sale and distribution of that data.

Featured Video

The Internet of Things That Do What You Tell Them: Cory Doctorow passionately explains how computers are already entwined in our lives, which means laws that support lock-in are much more than inconveniences.