The Free Software Foundation Europe referred to this post in its comments on the October 2017 Tallinn Declaration for e-government proposed by the Estonian presidency of the Council of the EU.

Governments have been flocking to GitHub.

Their reasons are plenty: the promise of “private sector” tools, a conviction that publicly-funded code should be public, the company’s evangelism (and stickers), etc. Whatever the case, GitHub now hosts at least 600 government organizations, with over 9,000 public repositories between them.

I had a notion of the global ecosystem this activity has sprouted—the players and their interactions—but wanted to back it up with data.

So, using GitHub’s API, I compiled a database of government GitHub organizations, their repositories, members, and contributors and dove in.

Overall, reuse within the government GitHub “ecosystem” is uneven and limited.

Nearly all popular repositories (inside and outside of government) were created by US and UK national organizations. The bulk are standards or frameworks. Modular products, like data.gov.uk’s CKAN extensions, also seem relatively reusable.

Collaborative work and reuse is most concentrated within the large US and UK national-level networks. This may point to the importance of scale, “real world” interactions (e.g. talks, meet-ups, employees switching between organizations), and the alignment of policy priorities, timelines, licensing, and tech stacks.

14% of repositories have no further activity after being posted to GitHub. 46% remain under development a year after they were created.

I didn’t find a license file for half of the repositories. At least 13% use the MIT license. At least 8% use some version of the GPL. License choice varies geographically.

Government GitHub organizations are bringing some new users to the platform along with them. But 45% of the users predate the government organizations they contribute to.

Estonia has the most government repositories per capita at 72.8 per million residents (hover over and click to zoom in on the map up top).

323 organizations are “loners,” with no contributors shared with other organizations. These don’t appear in the graph.

The thicker edges represent more contributors in common. Nodes are sized by the number of other nodes they’re connected to (their “degree”). If you view the full size graph, each node links to its GitHub site.

Two main clusters stick out. Up top in bright green are UK national organizations. On the right in purple are the US federal organizations.

The City of Philadelphia has the most prominent non-national organization. You might also notice the DC Government in the mix, as well the USGS/NOAA, Brazil, Canada, and Australia sub-networks.

It’s likely some of the connections aren’t real, but are the artifacts of cloned repositories. These may retain the original repository’s contribution history in addition to any new commits, but the API won’t mark them as forks.

This turns the contributor graph into a mish-mash of genuine collaboration of one organization’s members with another’s, non-members who contribute code to multiple organizations, and reuse.

This graph of 96 nodes (organizations) is tied together using 148 edges. Behind these edges are 327 individual member connections from 137 unique users (see user statistics).

Again, we find the highly inter-linked US federal sub-network. There are also the smaller UK and Canada membership sub-networks.

GitHub makes membership private by default, so there are likely more member connections in reality. But, in general, it makes sense that this graph would be much sparser than the contribution graph.

Why is the US federal sub-network comparatively dense? Many of them have large memberships, so there are more potential connections to be made. Some (like 18F) have a policy requiring that staff make their membership public. A number operate as consultancies to other federal agencies. And, from what I’ve seen, many “techies” enter the US federal government through one of these agencies and then hop around.

Organization, Repository & User Statistics

There are certainly many other questions to look into. Check out this repository to generate your own database (or reuse the one there).

Organizations

The list I used included 600 government GitHub organizations. You can see their geographic distribution on the map up top.

No. repositories

Of note, not only does UK's Government Digital Service make the top 10, but so does its GitHub organization for retired repositories! Neat appearances from the Norwegian Meterological Institute, the Gemeinsamer Bibliotheksverbund in Germany, and the National Library of Finland.
This includes only repositories that are not themselves forks.

Member counts are likely quite a bit higher in reality. Because GitHub defaults to private membership, in my experience many users don't switch their preferences to be public. I bet that this isn't intentional in most cases. GitHub would do well to make this option more obvious when you first join an organization.

Repositories

Listed below are each license type, their frequency, the regions that most frequently use the license, and the percentage of each region's repositories with the license.
I only include regions with at least 10 repositories and 2 organizations.
Note: The GitHub API only looks for a license file at the root of the repository, so licenses embedded in the README or stored in a subfolder are marked as having no license. Italy's not really that bad!

Many government GitHub repositories have a fairly short development lifespan. 14% show no further development after after they were first pushed to GitHub. However, 46% were under development a year in, 18% two years in, and 6% three years in.
11 repositories have Git histories earlier than their initial push date. Seems like a lot of development happens locally and without version control, then the repository is dumped on GitHub.

Users

The data showed 7887 public contributors to government repositories (that are not forks) and 1512 public members of government organizations.

User join date vs. organization creation date

Are government GitHub organizations spurring new users to join the platform? We don't know the date users joined an organization, so we'll have to proxy.
The histograms below show the difference (in days) between 1) when a user joined GitHub and 2) the earliest creation date of all the government organizations to which they contribute or belong.
There's clearly a bump in the center. That spike shows users who joined GitHub at the same time as the government organization with which they're involved. Some of those users who came to the platform later may also have joined a government organization the same day as their arrival.
In each case, about half of users predate their organization. This says to me that public, social coding is generally new at an individual level—not just at an institutional one.

No. repositories contributed to

The top contributors (in terms of government repository count, at least) are all members of UK's Government Digital Service, 18F, or the Consumer Financial Protection Bureau.
A majority of users are one-off contributors (see percentiles).

All but one (mattbostock of the UK) of the top ten members are part of the web of U.S. federal programs—the USDS-18F-PIF-CFPB-White House connection.
It's quite rare for a user to be a member of more than one organization.