Next week I am off to Grenoble to present a new paper at Session 3 New Ideational Turns, as part of panel 84 New directions in the study of public policy, convened by Peter John, Hellmut Wollmann and Daniel A. Mazmanian, the 1st International Conference on Public Policy, Grenoble, France, June 26-28. Friday 28 June, 8.30-10.30, Sciences Library Auditorium.

This paper argues that the discussion of public policy online is offering new and exciting opportunities for public policy research exploring the role of policy ideas. Although considerable work focuses on political ideas at the macro or mid-range, specific policy ideas and initiatives are overlooked, thought to be “too narrow to be interesting” (Berman, 2009, p. 21) .This paper argues that the prolific use of social media among policy communities means it is now possible to systematically study the micro-dynamics of how policy ideas are coined and fostered. Policy ideas are purposive, branded initiatives that are launched with gusto; flourish for around a thousand days; and then disappear with little trace as attention shifts to the latest and loudest. At best, media reports will document that Birmingham’s Flourishing Neighbourhoods initiative has been “scrapped”, “Labour’s Total Place programme has been “torn up”, or the Coalition’s big society policy is “dead”. Save for a return to the policy termination literatures of the late 1980s, our impotence in conceptualising such death-notices reveals how little effort has been invested in understanding and theorising the lifecycle of policy ideas. In response, this paper conceptualises policy ideas, their life, death and succession. The paper draws on a case of the recent Police and Crime Commissioner elections held across England and Wales in November 2012, and the attempts of the Home Office to coin and foster the hashtag #MyPcc.

For the last few months we have been collecting discussion of public policy on social media using DiscoverText. We are trying to understand how public policy is discussed online. To date we have collected just under half a million Facebook posts and YouTube comments on 26 different policies and issues.

The work to understand the shape of the debate starts by de-duplicating exact and near duplicates, then we check the tweets are on-topic, and not just opportunist hashtag spam. We then identify those that express opinion about the topic and divide them up by theme. We draw on a dispersed team of real life human coders who code portions of the datasets. We check for inter-coder agreement and validity. We use the human coding to train custom machine classifiers to classify large portions of the datasets, reducing the need for human coding. One further way of getting a sense of the emerging shape of the discussion is to ask a group of people to Q sort a diverse sample of items using crowdsortq.com. The analysis identifies shared viewpoints and informs further rounds of coding.

“We’re pulling in every tweet, every post, every outward link, every update…It is a huge challenge – the platforms make it difficult for us, they keep changing how we can get hold of it…The APIs don’t give us enough – especially when things trend –but there are ways around it. We find a way around it…We’ve got big data”.

There is a new kind of social scientist: the big data social scientist. Top of their Desert Island Disc choices might be Queen’s I want it all (you know the one – “I want it all, I want it all, I want it all, and I want it now!”)

There is something in the “I want it all” aspiration that reminds me of when I travel on the train, and see the man at the end of the platform, notebook and SLR camera around his neck, sandwiches and flask of coffee packed carefully in his knapsack. He has in his hand a book with the number of the item of rolling stock currently in service. He has photographed and recorded 50% of them, he knows he has another 50% to go. He also wants it all. So that’s him, the new generation of big data social scientists, and Freddy Mercury: they all want it all.

When I recently started to capture (or harvest, or some say ‘scrape’) tweets about the police and crime commissioner elections, I found myself with a spreadsheet of 100,000 rows and ten fields of metadata – that’s 1m datapoints. For a first timer to this world, I had myself big data. It was exciting. I had it all.

Then I started to learn more about the mechanism I was using to pull in these tweets. Blogs and websites were warning me that using the API of Twitter to do this gives you sometimes as few as only 1% of the actual tweets. The limitations of 1500 an hour mean that you can’t get everything. For people who like to collect tweets about Obama or Occupy, there are times of the day that you could easily just end up with a tiny sample of a huge volume of tweets. But there are solutions, these people tell you – pay us a few hundred dollars and we will get it all for you. Yes, all of the tweets. No restrictions. You can have it all.

Meanwhile, imagine the big data social scientist, Ipod strapped to his arm, out for a 5K run to relieve some pressure, singing along, mulling the proposition over…

“Not a man for compromise and where’s and why’s and living lies So I’m living it all, yes I’m living it all,
And I’m giving it all, and I’m giving it all,
It ain’t much I’m asking, if you want the truth,
Here’s to the future, hear the cry of youth,
I want it all, I want it all, I want it all, and I want it now”,

And you can have it now my friend. Just enter your card details and you can have it all.

Looking back at my spreadsheet of big data, it doesn’t seem as big any more. I’ve just got an unknown sample of tweets. And to add to that I don’t have their Facebook activity, or their LinkedIn or what they wrote on the Guardian article or BBC news site, or their blog post. I really don’t have very much. I really only have a little bit.

The fantasy of having ”it all” seems like a possibility because we have the technology, or at least we have come close to it. The new generation of big data social scientists will tell you it was easier a couple of years ago – the platforms were less protective, whereas now they are becoming risk averse or enlightened to how they can monetize and exploit their big data. But they battle on. “If you don’t know how to hack, code or have the means to pay ”, they will tell you, “then you need to think carefully before getting involved with the world of big data. You might be better suited to just regular ‘data’ ”.

But hang on. Let’s look at that spreadsheet again – the one with the 1m data points. There’s quite a bit in there. We should stop ourselves from judging our data by what we don’t have and instead think what can we learn from what we’ve got. It is a simple point, but the quality of your data depends on the questions you are asking and the claims you want to make. There are as many unanswered questions in this spreadsheet as there are tweets. The key is not to try and answer them all, nor is it to be led completely by the availability of data, but rather we need to be creative with our questions and to exploit what we have.

It’s time to be happy with our lot – time to change the playlist – what’s that tune by Bobby McFerrin?

This blog post describes an on-going research project sponsored by the British Academy called “The Shape of Ideas to Come”.

This project studies Tweets that express opinion about policy ideas. By Tweets I mean those 140 character messages that people send over Twitter. By policy idea I mean anything from ‘climate change’ to ‘big society’. In most cases they are deliberate policies invented by governments, policy makers or organisations. The interesting thing about policy ideas is that they tend to end up being discredited, usually within 3 years. The focus of this research is how users of Twitter express opinion about policy ideas.

The first job is to capture discussion around a particular idea. Let me give you an example. In November 2012 the Home Office were responsible for the election of 41 Police and Crime Commissioners. The voter turn-out in the election was pretty poor, but nevertheless the election went ahead and there are now serving PCCs in every police area of England and Wales. The bit that is of interest to this project is how the Home Office developed a hashtag of #MyPcc to focus Twitter discussions of the election. Within a few hours users on Twitter were using the hashtag to criticise the election and the rationale for the policy of having elected commissioners. For the purposes of this project we collected 100,000 Tweets that included either #MyPCC or #PCC. We started the collection three weeks before the election. As you would imagine most of the Tweets and discussion came in the final few days before the election, with almost half coming on the day after the election, during the time the results were being announced and the issue was prominent in the news cycle.

When we sat down to examine the 100,000 Tweets the first thing we noticed is that many of them expressed opinion about the policy idea, but, importantly, not all. Many of the Tweets we found to be conversations between Twitter users or, alternatively, factual where candidates and blogger s are publicising meetings or directing users to look at their websites. But alongside all of this conversation and broadcasting were relatively clear expressions of opinion. The kind of opinionated Tweets included phrases like: “I think this policy is a waste of time”; “In my opinion this is privatisation by the back door”; “I imagine this will end up costing more than the previous approach”; “It is clear that nobody has a firm grip of what needs to done”; “I think this is an important step forward and we need to embrace it”.

Although these opinionated Tweets vary, initial categorisation reveals there to be overlapping themes, repeated phrases and use of metaphor and cliché. Although much can be learnt from isolating the opinionated tweets from the others, how to go about separating them out for analysis is a major challenge facing this project. Thankfully there are some software tools available that can automate much of the process, but because we are dealing with subjectivity it also needs human intervention. It requires analysts.

How it works is this. The analyst signs in to a secure website. They are given a coding scheme – usually something simple like “1. Opinion” “2. Not” and a batch of Tweets. Once underway, the first Tweet flashes up full screen. Hit “1” for Opinion, and “2” for Not. Once coded the remaining Tweets flash up one by one until all items are coded or the analyst presses the Stop button. Because everybody signs in from their own device, several analysts can be working on the same set of Tweets at any one time. Not everybody will agree on the categorisations, but through discussion of this disagreement that we can clarify our working definitions. Armed with clearer definitions we can move to code new batches of Tweets with greater accuracy. Throughout the process the software is learning about the nuanced distinctions between opinionated and non-opinionated tweets. Following further rounds of coding and review the process of classification can then be handed over to the machine. This automation opens up the potential to classify thousands of Tweets in a matter of seconds.

Once the software is trained, the role of the analysts becomes one of devising a coding scheme to categorise the opinionated Tweets. This is an iterative process but the aim is to identify key themes and overlaps and remove duplicates. The aim is to represent the range and diversity of debate.

If you are interested in getting involved in the role of coding and classifying tweets about policy ideas please contact the Principal Investigator Dr Stephen Jeffares, University of Birmingham.

The other day I spotted a blog post by Joe Senior from the customer insight company – Clarabridge. Entitled “Work the Stock Market with Sentiment and Text Analysis – We’ll Show You How at GAIM USA 2013”, Joe reported how he was about to talk to hedge fund managers about the potential of “using technology to analyze Twitter data to figure out what is going on in the market – and offering trade-related insight into whether you should buy or sell” The author continued: “we’ll be analyzing Tweets about some major organizations to try to ferret out sentiment and correlate it with stock prices and issues, all in real time. This is really game-changing stuff for traders and hedge funders”.

The reason I bring this up is for two reasons. The first is it captures the excitement around analysing social media sentiment and the insight and even predictive potential it brings. And second because I think it sums up where much of the innovation is in social media and sentiment analysis is at the moment. It is driven by the potential of making money. The innovation, it seems, is following the money.

This was highlighted in a recent National Centre for Research methods research funding call. The call suggested the innovation and growth of Social Media Analysis was “driven by the demands of the commercial sector” and that academic capacity was “some way behind” (NCRM 2012: p.6)

As a social scientist working in a University and with a background in policy analysis, I am excited by the fast moving developments in the text analytic, social data analysis, sentiment analysis world but I don’t share the profit motivation. If all we use this stuff for is to be one step ahead of the market, then we are missing a trick. But with so much at stake, much of the innovation will take place behind commercial smokescreens and intellectual firewalls.

In search of what else is going on, I did a rapid review of peer reviewed journal articles that were gaining insight from Twitter. I was relieved to find alongside the burgeoning commercial literature, there are three other kinds of literatures emerging – the democratic focusing on social movements and the potential of Twitter in social change. The political – those seeking to predict electoral patterns, turnout, spurred on by the recent social media frenzy of Obama verses Romney. And then there are practice literatures, around how this changing public service engagement with publics, journalism, policing and research itself, particularly for mapping and the possibilities of big data. So not all social scientists are dedicated to the study of Twitter for the purposes of profit and commercial gain.

Although there is the acknowledgement of sentiment, meaning, subjectivity of Tweets – the data is increasingly massified. There is a drive to speed up the analysis and show the weight of opinion as it shifts in almost real time. David Cameron – up 3%, the big society down 2%. But as political scientist Stu Shulman said on a screen cast recently, with hollow sentiment meters “It is very hard to know what one ambiguous tweet means, much less what they all add up to” “. He takes issue with what he calls hollow sentiment meters and the rise of the misleading info graphic. His antidote is a tool is called DiscoverText. It allows the user to import Tweets by word mentioned or hashtag. You can then visualise the set as a word cloud, remove duplicates, cluster near duplicate tweets. Unlike a lot of tools designed to map social networks the emphasis is on interpreting meaning. You can code the tweets either alone or allocate batches to peers and colleagues to code in the cloud. You can use this coding to train a classifier to automatically code up batches of 1000, 10,000 or I suppose a million tweets. The point is that the classifier is contextualised and able to detect the nuance and idiosyncratic nature of a tweet.

I’m using it to look at how policy ideas, ideas like total place, flourishing neighbourhoods, big society, live and die in the world of social media. Most government departments and local councils have Twitter feeds and policy ideas on Twitter are hashtags. During the recent Police and Crime Commissioner elections I collected 100,000 tweets that referenced #PCC and #MyPCC. I am currently designing a classifier in DiscoverText that can distinguish between tweets that express sentiment towards the policy and those that report facts, promote external weblinks or are just plain spam. From this I then use Q methodology to map the inter-subjective viewpoints that emerge around the policy idea. My aim is to show the emergence of subjectivity around policy ideas from the point of launch and how these micro-concourses evolve daily.

There’s more work to do, but what I hope is this is an example of how you can use large Twitter datasets for other things than deriving quantities. It is the subjectivity and shape of ideas that matters. So big data sets of social data, the motive needs to be more than predicting stock market performance. We can do so much more.