Join us in Lisbon for TICTeC2018

Dave Whiteland

Our EveryPolitician project makes data on the world’s politicians available in a useful, consistent format for anyone to use. If you’ve been following our progress, you’ll know we’ve already collated a lot of data (over 72,000 politicians from 233 countries). The work on adding to the depth and breadth of that data is ongoing, but EveryPolitician data is already being used to do interesting things.

In that case, the useful data for Politwoops was the politicians’ party affiliation. But our team (a handful of humans and one very busy bot) collects richer data than just that. EveryPolitician data includes contact information for politicians.

At mySociety, we know how powerful this particular kind of data can be. For example, our WriteToThem site makes it easy for UK citizens to contact their representatives (WriteToThem grew out of the earlier online service FaxYourMP, and uses the now more common technology of email).

Of course, there’s nothing especially radical about collecting email addresses of politicians… or phone numbers, Twitter handles, or Facebook pages. Indeed, many individuals and groups do just that. But an important difference with EveryPolitcian is that we’re not just collecting data (which happens to include those things, as well as a host of others) but also making it available so it’s easy to use. We do that by putting it out in consistent, useful formats.

For many projects, downloading a CSV of current politicians from EveryPolitician will be enough. That can be opened as a spreadsheet, and if one of those columns is called email, you’re good to go.

Opening a spreadsheet is just one way of accessing the data. Our own use of EveryPolitician data to power the “Write in Public” MajlisNameh site for ASL19 (see this blog post for more about that) demonstrates a more programmatic approach.

But the whole point of making data available like this isn’t so that we can use it. It’s for other people, other groups. Anyone can build more nuanced or complex services with this data too.

For example, the people at Represent.me have built a sophisticated platform for gathering opinions and votes that can be shared with politicians and constituency MPs. It’s a system of information-gathering that has a network of citizens at one end feeding into their political representatives at the other. They use EveryPolitician’s data to populate their system with information about those representatives, including contact details, for each country they operate in.

And, because we make sure our data is consistently formatted, it’s a good general solution. As they cover more areas, they can expect the code they’ve written to ingest the EveryPolitician data in the countries they’re already operating in to also work as they expand into others.

If you’re running a project that needs such data, you could invest time and effort finding and collecting it all yourself. But it’s almost inevitable that you’d be using the same public sources that we are anyway — after all, we try to identify and use all the sources we can, merging them together into one, collated whole — so really it makes sense to simply take the data from EveryPolitician. Remember, too, that once our bot has been told about a source, it checks it daily for changes and updates too. So instead replicating the effort we’re already doing to gather the same data you need, you’re free to focus on developing the way your project uses that data… while we hunker on down and get on with collecting it.

Inevitably, as with all software projects, there’s always lots more to do, but already the value of providing useful data — and especially contact information — in a consistent format is clear.

Politwoops tracks politicians’ tweets, and reports the ones that are deleted.

Often those tweets are deleted because of a typo: everyone makes simple mistakes with the buttons on their devices, and politicians are no less human than the rest of us.

But Politwoops’s targets are public servants who use Twitter to communicate with that public. And sometimes the contents of the tweets they delete are not simply the result of bad typing. Those tweets can be especially interesting to people whom those politicians are representing: sometimes they may be evidence of a usually-suppressed prejudice, or an attempt to remove evidence of a previously held opinion that is no longer convenient.

In effect, Politwoops is a public archive of direct quotes that would otherwise be lost.

And also… EveryPolitician

Our EveryPolitician project is an ever-growing collection of data on every politician in the world (we’re not there yet, but we’re over 230 countries and 72,700 politicians in, and counting).

Like Politwoops, our data includes politicians’ Twitter handles. But also a lot more besides.

We make that data useful by putting it out in consistent, simple formats — the simplest of which is a comma-separated value (CSV) file for each term of a legislature. In practice, that means if you want a spreadsheet of the current politicians in your country’s parliament, then EveryPolitician is probably the place for you.

Put them together…

Now, Politwoops predates EveryPolitician by several years, and they’ve being doing their thing without needing our data just fine. In fact, Politwoops has been happily politwooping since 2010 (Politwoops is a project of the Open State Foundation, based in the Netherlands).

But… who doesn’t want to add something extra for free? Our data also includes Twitter handles (mostly but not entirely from the same public sources Politwoops were using). So that meant they could take our CSVs and match each line—all that extra data!—with the Twitter handle.

Better, for free

So last year, they augmented their data with ours for one very simple win: they get to know party affiliation for the politician associated with each of those twitter handles. Well, actually they get to know lots of other things besides party — gender, date of birth, or… well, all our other data, if they wanted it. But just party? That’s also fine.

This all means that Politwoops now shows the party of each tweet’s deleter, just because they merged our CSV with theirs. Lovely!

A tiny detail… party affiliation (arrowed) added to @deletedbyMPs tweets [screenshot of https://politwoops.co.uk/]

Although party affiliation was the detail Politwoops went for, it turns out the other data from EveryPolitician was a little too tempting for them to ignore… So recently they’ve been doing some playful analysis on their statistics using the gender breakdown that EveryPolitician data makes possible too. You can see more on the Politwoops website.

You can too

To be clear: Politwoops did this, not us. We’re committed to doing the groundwork of finding, collecting and collating the data, and making it available (and, additionally, endlessly checking for updates… if you’re interested in how this all works, you can read our bot’s own blog). We do this so people who want to get on with using the data can do just that. As did, in this case, Breyten and his team at Politwoops.

EveryPolitician’s data is available as plain CSVs for this kind of thing, but we also provide a richer JSON version too if that’s more useful to you. All the files are downloadable from the website. If you’re a coder who wants to dive in, there are libraries to make it even easier for you (the EveryPolitician team works in Ruby, so we wrote the everypolitician gem, but there are also ports to Python and PHP).

The OGP was launched in 2011 to provide an international platform for domestic reformers committed to making their governments more open, accountable, and responsive to citizens. Its goals are, therefore, pretty much in line with the objectives of civic tech — the field we at mySociety been ploughing in for over a decade. So we’ve been supporters of the OGP since it started, as have many of our partners and collaborators.

Today, there are 70 participating countries. Inevitably, some member countries have governments whose enthusiasm for the OGP goals is, perhaps, greater than their willingness to actually implement the changes that attaining such goals require. But even where that is the case, the OGP is a catalyst for getting accountability and transparency onto their agenda, and we’re all for that.

All of mySociety’s projects fall somewhere within the remit of the OGP. For example, our Freedom of Information platform Alaveteli encourages citizens to make requests, and our work with parliamentary monitoring organisations is all about opening up the activity of parliaments in a way that OGP very much promotes.

Even official records aren’t as safe as you might think they are. The archive of a country’s political history might be wiped out in a single conflagration.

Take the example of Burkina Faso, a beautiful West African country that is, sadly, perhaps best known to the rest of the world for its troubled political past.

The uprising in Burkina Faso in 2014 led to a fire in the National Assembly building and archives office. Nearly 90% of the documents were lost. Now the National Assembly is working to reconstruct the list of its parliament’s members before 1992.

This means that the data EveryPolitician has on Burkina Faso has nothing from terms before 1992. We’ve got some data for six of the seven most recent terms from the National Assembly so far, of which five are live on the site. Even though that data is not very rich (there’s little more than names in many cases; and the 6th term was transitional so data on that one’s membership might remain elusive) it’s a beginning.

We know from experience that data-gathering often proceeds piecemeal, and names are always a good place to start.

As Tinto finds new data, whether that’s more information about the politicians already collected or membership lists of the missing terms before 1992, we’ll be adding that to EveryPolitician too.

A vast collection

When people ask what EveryPolitician is, we often say, ‘The clue’s in the name’. EveryPolitician aims to provide data about, well … every politician. In the world.

(We’ve limited our scope — for the time being — to politicians in national-level legislatures).

The project is growing. Since our launch last year, we’ve got data for legislatures in 233 countries. The amount of data we’ve collected currently comprises well over three million items. The number of politicians in our datafiles is now in excess of 70,000.

Seventy thousand is an awful lot of politicians.

In fact, if you think that might be more politicians than the world needs right now, you’re right: as the Burkina Faso example shows, EveryPolitician collects historic data too.

So as well as the people serving in today’s parliaments, our data includes increasing numbers of those from the past. (Obviously, if you have such data for your country’s legislature, we’d love to hear from you!)

More than just today’s data

The Burkina Faso fire is an illustration of the value of collecting and preserving this historic data.

Of course, we’re fully aware of the usefulness of current data, because we believe that by providing it we can seed many other projects — including, but in no way limited to, parliamentary monitoring sites around the world (sites like our own TheyWorkForYou in the UK, or Mzalendo in Kenya, for example).

Nonetheless, we never intended to limit ourselves to the present. By sharing and collating historic records too, we hope to enable researchers, journalists, historians and who-knows-who-else to investigate, model, or reveal connections and trends over time that we haven’t even begun to imagine. We know this data has value; we look forward to discovering just how much value.

But it turns out we’re providing a simpler potential benefit too. EveryPolitician’s core datafiles are an excellent distributed archive.

Future-proofing

What Burkina Faso’s misfortune goes to show is that, as historians know only too well, data sources can be surprisingly fragile.

In this case the specific situation involves paper records being destroyed by fire. That is a simple analogue warning to the digital world. Websites and their underlying databases are considerably more volatile than the most flammable of paper archives.

Database-backed sites are often poor catalogues of their pasts. Links, servers and domain registrations all expire. Access to data may be revoked, firewalls can appear.

Digital data doesn’t fade; instead it is so transient that it can simply disappear.

Of course, we cannot ourselves guarantee that our servers will be here forever (we’re not planning on going anywhere, but projects like this have to be realistic about the longer view).

There is an intriguing consequence of us using GitHub as our datastore. The fact is, the EveryPolitician data you can download isn’t coming off our servers at all. Instead, we benefit from GitHub’s industrial-scale infrastructure, as well as the distributed nature of the version control system, git, on which it is based. By its nature, every time someone clones the repository (which is easy to do), they’re securing for themselves a complete copy of all the data.

But the point is not necessarily about data persisting far into the next millennium — that’s a bit presumptuous even for us, frankly — so much as its robustness over the shorter cycles of world events. So, should any nation’s data become inaccessible (who knows? for the length of an interregnum or civil war, a natural disaster, or maybe just a work crew accidentally cutting through the wrong cable outside parliament), we want to know the core data will remain publicly available until it’s back.

Naturally there are other aspects to the EveryPolitician project which are more — as modern language would have it — compelling than collecting old data about old politicians. But the usefulness of the EveryPolitician project as a persistent archive of historical data is one that we have not overlooked.

mySociety’s EveryPolitician project aims to make data available on every politician in the world. It’s going well: we’re already sharing data on the politicians from nearly every country on the planet. That’s over 68,652 people and 2.9 million individual pieces of data, numbers which will be out of date almost as soon as you’ve read them. Naturally, the width and depth of that data varies from country to country, depending on the sources available — but that’s a topic for another blog post.

Today the EveryPolitician team would like to introduce you to its busiest member, who is blogging at EveryPolitician bot. A bot is an automated agent — a robot, no less, albeit one crafted entirely in software.

First, some background on why we need our little bot.

Because there’s so much to do

One of the obvious challenges of such a big mission is keeping on top of it all. We’re constantly adding and updating the data; it’s in no way a static dataset. Here are examples — by no means exhaustive — of circumstances that can lead to data changes:

Legislatures change en masse, because of elections, etc.
We try to know when countries’ governments are due to change because that’s the kind of thing we’re interested in anyway (remember mySociety helps run websites for parliamentary monitoring organisations, such as Mzalendo in Kenya). But even anticipated changes are rarely straightforward, not least because there’s always a lag between a legislature changing and the data about its new members becoming available, especially from official national sources.

Legislatures change en masse, unexpectedly
Not all sweeping changes are planned. There are coups and revolutions and other unscheduled or premature ends-of-term.

Politicians retire
Or die, or change their names or titles, or switch party or faction.

New parties emerge
Or the existing ones change their names, or form coalitions.

Areas change
There are good reasons (better representation) and bad reasons (gerrymandering) why the areas in constituency-based systems often change. By way of a timely example, our UK readers probably know that the wards have changed for the forthcoming elections, and that mySociety built a handy tool that tells you what ward you’re in.

Existing data gets refined
Played Gender Balance recently? Behind that is a dataset that keeps being updated (whenever there are new politicians) but which is itself a source of constantly-updating data for us.

Someone in Russia updates the wikipedia page about a politician in JapanWikidata is the database underlying projects like Wikipedia, so by monitoring all the politicians we have that are also in there, we get a constant stream of updates. For example, within a few hours of someone adding it, we knew that the Russian transliteration of 安倍晋三’s name was Синдзо Абэ — that’s Shinzo Abe, in case you can’t read kanji or Cyrillic script. (If you’re wondering, whenever our sources conflict, we moderate in favour of local context.)

New data sources become available
Our data comes from an ever-increasing number of sources, commonly more than one for any given legislature (the politicians’ twitter handles are often found in a different online place from their dates of birth, for example). We always welcome more contributions — if you think you’ve got new sources for the country you live in, please let us know.

New old data becomes available
We collect historic data too — not just the politicians in the current term. For some countries we’ve already got data going back decades. Sources for data like this can sometimes be hard to find; slowly but surely new ones keeping turning up.

So, with all this sort of thing going on, it’s too much to expect a small team of humans to manage it all. Which is where our bot comes in.

Hello bot

To be honest with you, the bot doesn’t really look like this because, being software, it’s entirely non-corporeal. Sorry.

We’ve automated many of our processes: scraping, collecting, checking changes, submitting them for inclusion — so the humans can concentrate on what they do best (which is understanding things, and making informed decisions). In technical terms, our bot handles most things in an event-driven way. It springs into action when triggered by a notification. Often that will be a webhook (for example, a scraper finishes getting data so it issues a webhook to let the bot know), although the bot also follows a schedule of regular tasks too. Computers are great for running repetitive tasks and making quantitative comparisons, and a lot of the work that needs to be done with our ever-changing data fits such a description.

The interconnectedness of all the different tasks the bot performs is complex. We originally thought we’d document that in one go — there’s a beautiful diagram waiting to be drawn, that’s for sure — but it soon became clear this was going to be a big job. Too big. Not only is the bot’s total activity complicated because there are a lot of interdependencies, but it’s always changing: the developers are frequently adding to the variety of tasks the bot is doing for us.

So in the end we realised we should just let the bot speak for itself, and describe task-by-task some of the things it does. Broken down like this it’s easier to follow.

We know not everybody will be interested, which is fine: the EveryPolitician data is useful for all sorts of people — journalists, researchers, parliamentary monitors, activists, parliamentarians themselves, and many more — and if you’re such a person you don’t need to know about how we’re making it happen. But if you’re technically-minded — and especially if you’re a developer who uses GitHub but hasn’t yet used the GitHub API as thoroughly as we’ve needed to, or are looking for ways to manage always-shifting data sets like ours — then we hope you’ll find what the bot says both informative and useful.

We’re helping our friends at Citizen Beta by sponsoring (and co-hosting) an international-themed event next Monday (25th April 2016, 7pm–10pm). There will be three presentations, one from OneWorld about their work overseas, one by our own Jen about some of our recent international projects, and then there’s the spotlight: a talk from the Kuala Lumpur-based Sinar Project.

Citizen Beta is a roughly-monthly meetup for civic tech people. Next Monday’s event will follow the usual format of Q&As after each presentation with (deliberately) lots of meeting, mixing, and chatting too. It happens in Newspeak House in Shoreditch, and refreshments will be provided.

We’re very pleased to have been able to make this event happen. It was precipitated because Khairil from Sinar is coming over to our side of the world for mySociety’s forthcoming TICTeC 2016 conference in Barcelona. (If you haven’t already got a ticket to TICTeC then, sorry, you’re too late; but of course keep an eye on this blog because we’ll be sharing photos, videos and accounts from the event).

OneWorld, like mySociety, are based in the UK, but work all over the globe. Their projects often depend upon the classic civic tech components of web and mobile, but they’re also actively involved in addressing health and rights issues in the countries in which they operate. They have a wealth of experience from running projects in developing countries, especially when it comes to understanding both the capabilities and the limitations of the technologies they use.

The Sinar Project, which the aforementioned Khairil heads, is a civic tech group based in Kuala Lumpur. They first came to our attention because they were using two of our codebases (FixMyStreet and MapIt). We’ve been friends ever since, and it’s always been a delight when our international paths have crossed. In many ways Sinar perfectly epitomises the philosophy of civic tech/open source reuse that mySociety, and specifically Poplus, is so passionate about: when you have a small team of super-focused developers (shout-out to Sinar’s Motionman and Sweester!), you simply can’t afford to waste time reinventing the wheel.

On Monday evening Khairil will be showing some of the impressive ways they have been combining their tech skills and tools in a political environment which is considerably more hostile than that in the UK.

Oh, one last thing — sorry; we do realise that this is a London event. mySociety itself is a remote-working organisation, which means we’re spread all around the UK, so we know that not everything happens in London. But that’s where Citizen Beta is, so that’s where this particular meetup is happening.

It’s going to be great — so if you can come, please sign up. See you there!

As ever with Mozilla’s annual, hands-on festival, there was a lot going on in London’s Ravensbourne, a venue that’s especially conducive to mixing and meeting.

MozFest attracts an active and positive crowd of digital people, ranging from junior-school coder kids right through to hoary old digital campaigners. So we were delighted to meet up with old friends and make new ones, especially as some of them had travelled for afar to be there. London was fortunate once again to be hosting the event, since Mozilla is of course an international organisation. And as our main focus at this year’s event was EveryPolitician — “data about every national legislature in the world, freely available for you to use” — that international aspect was especially welcome.

As a result of our being there, we hope that lots more people know about EveryPolitician’s data, and that some of them are going to build or do amazing things with it. We’re still adding to our data, so we’d love your help: we have data on at least the current term of the top-level legislatures of most of the countries in the world. But we’d still love your help with finding good sources for the remaining few, as well as our ongoing task of going wider (adding more details about the politicians we do have) and deeper (adding historic data from previous terms).

If, in the spirit of digital do-ism that infuses MozFest, you do make something useful or funky with EveryPolitician’s data, do please let us know. We make sure all this lovely data is available to you in a consistent way (that not only means the delivery formats of CSV or JSON Popolo, but also that we adopt reliable conventions about the way we use them). This maximises the likelihood that, when you share that thing you’ve built using the data for your country, people in other places will be able to easily adopt it to work with the data for theirs. And that’s why, if you’ve made something amazing, we’d like to know — so we can shout about it.

Finally: thanks to the people who made MozFest run so smoothly this year, and the spirit of the open web. See you next year!

Outside of her work for mySociety, our very own marketing and communications manager Myf somehow finds time to illustrate and sketch. She runs a popular personal blog (Myf Draws Apparently) which often includes sketch diaries of her adventures. The most recent one she’s just published is a lovely record of a trip to Madrid earlier this year.

Only, it wasn’t just a trip to Madrid — she was there as part of the mySociety team at AlaveteliCon 2015.

Alaveteli is mySociety’s Freedom of Information (FoI) platform, and AlaveteliCon was the second conference we’ve held on Freedom of Information technologies. It brings together people from all over the world who are involved in running sites like the UK’s WhatDoTheyKnow.

One of the pages Myf drew is this wonderful flow diagram of the way platforms like Alaveteli actually experience Freedom of Information.

Not everyone at the conference was running an Alaveteli-based site, which was sort of the point: the business of running FoI sites like these is not about the technology, it’s about the culture of citizens’ Right to Know. There’s a lot of variety in the FoI laws around the world, and they are regarded with wildly differing levels of enthusiasm by the authorities who ought to be bound by them. The groups running FoI sites invariably consist of individuals who are passionate and articulate on all aspects of Freedom of Information. At the conference, they shared tales of frustration in the face of intransigent or occasionally devious public authorities; anecdotes demonstrating beatific levels of patience in the face of obdurate official departments; and, of course, some wonderful stories of success.

Myf has captured some of this in her sketch diary. Although she didn’t create it for mySociety, some of us think that flow chart is too delightful and too relevant not to share here.

MozFest14. If you know where to look, a mySociety human is in there. Somewhere.

We were at the Mozilla Festival again this year. In practice, this meant we had a table at the Friday night Science Fair, ran a session in the Build and Teach the Web track about “Reusable Civic Tech”, and spent a lot of time meeting old friends and making lots of new ones (technically, we call that “networking”). This blog post is a shout-out to all the fabulous people we talked to, demonstrated with, learnt from, and perhaps even drank a cheeky beer with. It was excellent to meet you all.

Because we’re based in the UK, we’re especially lucky that the Mozilla Foundation’s annual festival was once again held in London. It’s good for us because our friends from the London Mozilla Space are there, and also because this makes it easy for us to get to (unlike so many of the attendees, we didn’t need to travel all the way into the country first). In fact, the unique and lofty Ravensbourne venue is an excellent location for such an event — it’s easy to see what’s happening on the other floors, and it’s easy to wander up and down between them. There is a lot going on to see and do, and, just like last year, even the stairs are productive: we had some serendipitous encounters on our way between floors.

Our primary activity at the festival was spreading the word about the Poplus federation and its reusable civic tech components. If you bumped into any of us, or if what we were demonstrating tickled your fancy, or if you are even now wearing one of the T-shirts we generously gave out: do please remember to get involved!

Photo shows Paul and Jen working on a budget document (no, really!) using the free wifi available at the Shwedagon Pagoda, Yangon.

Here’s a little more from the international team’s trip to Southeast Asia: we’ve already written about Myanmar’s first hackathon, but that wasn’t the only event we were able to attend.

We were in Yangon at just the right time to be invited to the launch of the Open Myanmar Initiative (technically, this was the launch event for diplomatic circles, which made us feel rather grand).

At the hackathon, second prize had gone to “Team Garlic” for their election fraud reporting app. So it was great to meet the OMI dev team, and discover that they were, in fact, one and the same. Go Team Garlic!

The team has done remarkable work, since the source data is from 22 thick volumes of printed, not digital, records of six parliamentary sessions.

We met lots of accomplished, dedicated and interesting people while we were in Myanmar, and we know some of them will be putting bits of our open source code to good use.

It was good to talk about technical matters too. It’s always informative to hear how people have approached the same kind of problems we’ve encountered (we’ve been running TheyWorkForYou since before 2006, of course, but we also actively work on parliamentary monitoring sites elsewhere in the world too).

We know from meeting groups like this that the obstacles they face are always a combination of unique problems, entirely specific to their own parliament, and more general difficulties that apply to every jurisdiction on the planet.

Or to put it another way, some people may go to Myanmar just to look at the pagodas — and although we did that too, even as we did so, at the back of each of our minds was the question, “I wonder if the administrative boundary data is available in KML format for this city?” Um, so… maybe we’re not quite like other tourists.

After Myanmar, Dave went on to Malaysia to meet up with our friends at the Sinar Project. Sinar and mySociety have been in contact since they started running AduanKu.my (a FixMyStreet instance for Kuala Lumpur) last year and it was marvellous to see them again.

They’re doing great stuff with their stretched resources, and recently have been working on getting Poplus components into their own parliamentary monitoring work — in fact we’re delighted that they will represented at the PoplusCon conference in Chile at the end of April.

mySociety

mySociety is a not-for-profit social enterprise, based in the UK but working with partners internationally. We build and share digital technologies that give people the power to get things changed, across the three areas of Democracy, Freedom of Information, and Better Cities.