About the Open Data Census

What is the US City Open Data Census?

The US City Open Data Census is an ongoing, crowdsourced measure of the current state of access to a selected group of datasets in municipalities across the United States. Any community member can contribute an assessment of these datasets in their municipality at any time. Census content will be peer-reviewed periodically by a volunteer team of Census librarians coordinated by the Sunlight Foundation.

How can the results of the US City Open Data Census be used?

The US City Open Data Census does not aim to create a comprehensive list of open datasets around the United States, nor does it aim to define what datasets are the most important to open. Instead, the Open Data Census seeks to be a benchmarking tool, which people can use to ignite conversations with their government about open government data.

What is the history of the US City Open Data Census?

The Open Knowledge Foundation created the Global Open Data Census in 2012 to provide a clear measure of available open data— not what is claimed, but what data is actually available and how open it is. This original Census was designed by open data experts, including the Open Government Working Group, and undergoes a process of peer review and evidence checking to ensure high quality results. In early 2014, Open Knowledge announced the bespoke Local Open Data Census. The US City Open Data Census was launched on Open Data Day in conjunction with Code Across 2014 in a partnership between Code for America, the Sunlight Foundation, and Open Knowledge. Ongoing reporting on local open data access by Code for America and the Sunlight Foundation creates a focus for debate and review.

What datasets are included in the city Census?

There are 18 initial datasets currently considered in the 2014 US City Open Data Census:

Dataset

Details

Asset Disclosure

Top-level government officials’ financial assets, including: name of top-level government officials, title, investment information, prior and current business relationships, real estate interests, and personal income (including gifts and travel or speaking payments). (More info)

Budget

Municipal budget at a high level (e.g. planned budget by unit of appropriation with a programmatic description of each unit of appropriation). This category is about budgets which are plans for expenditure (not actual expenditure in the past). (More info)

Business Listings

A directory of all licensed businesses in the municipal area, including key information such as: name, address, contact information, business type. (More info)

A complete list of city expenditures at a detailed transactional level (including: tax breaks, loans, contracts, grants, and operational spending). Records of actual (past) municipal spending at a detailed transactional level, for example, at the level of month to month expenditure on specific items (usually this means individual records of spending amounts at a fairly granular level - e.g. $5-50k rather than at the $1m+ level). Note: a database of contracts awarded is not considered sufficient. This data category refers to detailed ongoing data on actual expenditures. (More info)

Detailed discussions of the data categories relating to submissions and review
Issues and challenges for submitters and Open Data Census Librarians (reviewers) are discussed on the Census discussion list. The full Census discussion list archive is available here. Beginning in June, 2016, Sunlight Foundation staff will no longer regularly check the discussion lists. Any questions or comments to Census administrators should be addressed to usopendatacensus@gmail.com.

How reliable is the US City Open Data Census?

The information in the Census is collected by open data experts and enthusiasts around the world, including Code for America brigades, the Sunlight Foundation Local Policy Team, and the Open Government Working Group. The Census data undergoes a process of peer review and evidence checking to improve the quality of results. That said, we rely on the contributions of local community users of government datasets, so if you see a problem please submit a comment. Contributors and editors are also cited on each dataset submission.

Submitting information to the Census

The US City Open Data Census is a survey of the state of open data around the United States focusing on the the availability and openness of a specific set of key datasets.

What's the US City Open Data Census data collection and review process?

It works like this:

Contributors submit information about the availability (or not) of key datasets in their city (for example, Budgets in San Francisco).

For edits to submissions, contributors may Propose Revisions. These revisions then appear under a city's Census page as "awaiting review," until a librarian reviews the submission.

Open Data Census librarians either approve (with or without amendments) or reject the Proposed Revisions.

If approved, the new submissions become an official entry in the Census and are displayed in the main table of the website.

How can I improve the Census information about a US City?

If you have information about a dataset which isn't in the Census yet, you can add it! Anyone can submit new information to the Census.

Find your city in the list of Census cities and click on it.

Click the blue “Submit Information” button on the right next to the appropriate category.

Complete the submission form based on the dataset you have found (there are detailed instructions on the form).

Click Submit. Your submission is now waiting for review, and will be visible as "awaiting review" after a few minutes.

How can I correct an existing entry in the Census?

We welcome corrections to the US City Open Data Census. Anyone can submit corrections to the Census.

Find your city in the list of Census cities and click on it.

On the city overview page, click the blue “Submit Information” button on the right next to the appropriate category.

Complete the form based on the changes you want to make to the existing data.

Click Submit. Your submission is now waiting for review, and will be visible as "awaiting review" after a few minutes.

How can I add a new US city to the Census?

Reach out to usopendatacensus@gmail.com and we will work to add your city to the Census template. Each city should also choose a community point person to act as Open Data Census libarian (see below).

I am a public official. Can I become a US Open Data Census librarian for my city?

Community volunteers are welcome to step up to serve as Open Data Census librarians! However, to avoid potential conflicts of interest, we do not permit employees of the city, county, or other government agency being evaluated to become librarians. Individuals who work for their city are encouraged to engage with the Census and make Census submissions about their city, but a separate librarian must review those submissions for accuracy. A short overview of the Census for city officials is also available here.

How do I become my city’s Open Data Census Librarian?

Open Data Census librarians are the reviewers and point persons for the Census assessment in their community. They are responsible for filling out a profile page, becoming familiar with this FAQ and the Dataset Explainers, and periodically reviewing open data in their city.

Open Data Census Libarian Basic Responsibilities

By adopting a US City Census page, you agree to do the following:

Become familiar enough with the US City Open Data FAQ and Dataset Explainers so that you can serve as a front-line resource on these materials to your community.

Periodically review community datasets to keep your community up-to-date, i.e. review new datasets quarterly (and/or after coordinated national hackathons), and all datasets annually.

Stay in touch:

Contribute feedback to the international Census discussion list or join one of Open Knowledge International's Open Data Index discussion boards to hear from other active Census users.

What do all the questions about the datasets mean?

When filling in information about a dataset, there's a list of questions to answer about the availability and openness of the datasets. The answers then appear in the city overview page for the Census.

Question

Details

Weighting

Openly licensed?

The licence must comply with the Open Definition which allows data to be freely used, reused and redistributed. The Open Definition provides a list of conformant licences. If the data uses one of these licences, it is openly licensed.

Licences are commonly found in:

the web page footer

a link to Terms & Conditions

the About section

Some licences may allow re-use and redistribution but have not been assessed as conformant with the Open Definition. In this case, seek feedback on the Open Data Index discussion forum

30

Is the data machine readable?

All files are digital, but not all can be processed or parsed easily by a computer. In order to answer this question, you would need to look at the file type of the dataset. As a rule of thumb the following file types are machine readable:

XLS

CSV

JSON

XML

The following formats are NOT machine readable:

HTML

PDF

DOC

JIF

JPEG

PPT

If you have a different file type and you don’t know if it’s machine readable or not, ask in the Open Data Census forum

15

Is the data available for free?

The data is free if you don’t have to pay for it.

15

Available in bulk?

Data is available in bulk if the whole dataset can be downloaded easily. It is considered non-bulk if the citizens are limited to getting parts of the dataset through an online interface.

For example, if restricted to querying a web form and retrieving a few results at a time from a very large database.

10

Is the data provided on a timely and up to date basis?

Is the data current for the census year? You can determine or estimate when the data was last updated and its update frequency by reviewing:

the metadata displayed for the data in an open data portal or web page

the dataset title or filename e.g. Budget 2013-14 or Election_4July2015.csv

metadata tags embedded in the web page that contains the data

date values within the data to find the most recent date value

the timestamp on the data file (although this may not be accurate)

Some data is not updated on a regular basis. e.g. Pollutant emissions may be updated daily - while postal codes may not change for many years.

You may need to use your judgement to determine if the data is timely and up to date. Document your rationale in the comments section.

If you cannot determine a date, answer, "NO" i.e. the data is not timely or up-to-date.

10

Is the data available online?

Data is online if it can be accessed via the Internet (e.g. a website or open data portal). If the data has been emailed to you but is not accessible via the Internet, it is not considered to be available online.

5

Is data in digital form?

Data can be in a digital format, but not accessible online. For example: A country budget can be stored on a spreadsheet or otherwise on a private government network, but not on the Internet. This means that the data is digital, but not publicly available. If you know that the data is digital somewhere inside the government (e.g. a government official tells you so), then you should answer “YES” to this question and note in the comment section how you discovered the data is in digital form.

5

Publicly available?

Can the data be accessed by the public without restrictions? Data is considered publicly available when:

It can be accessed online without the need for a password or permissions.

If the data is in paper form, can be accessed by the public, and there is no restrictions on the number of photocopies that can be made.

Data is NOT publicly available when:

It is only made available after making a request.

It was availiable because of FOIA.

It can only be accessed by government officials.

5

Does the data exist?

Data must come from an official resource either issued directly by the government or by a third party officially representing the government. Data offered by companies, citizen initiatives or any non-governmental organisation do not count for the Index.

If the government has given the right to publish the data to third parties, a submission with a link a to third party site is allowed. The third-party site must explicitly state that the data has been commissioned by the government. Check if the organization has an agreement with the government to be the official source and make a note in the comment section.

5

How should I use the comments/details field when submitting and reviewing?

Comparing datasets between local governments is, as mentioned, a complex and often difficult task. This is why the comments/details field is public, so that submitters and Open Data Census librarians can explain the reasoning for their choices. In other words, the comments/details field is your main tool to ensure that your city’s entries and scores can be compared to those of other cities. We therefore strongly encourage you to be thorough in your comments, as that will reflect on how your city is perceived and compared.

Tip: Try to see the comments of cities with similar score in the given category, or go to cities whose data systems and governance structure may be similar to those of your city.

Questions about the assessment of openness

Are data to be considered publicly available if a right-to-know request, such as a freedeom of information (FOI) or public records request, is needed to retrieve them?

For Census purposes, publicly available means without having to put in a right-to-know request— so it should be available online without further ado.

What about cities where there is no official mention of licensing attached to the data in question?

What formats can generally be considered machine readable?

Since machine readability is not strictly a matter of data format, here are some further points to consider:
HTML, even when well structured, will only sometimes count as machine-readable and is, by default, not machine-readable— because it most often needs parsing and therefore is not directly reusable.

In general we suggest to look at machine-readable as a combination of fact and objective judgement, and not say that a particular format is automatically machine-readable or not machine-readable. So, machine-readable is to be understood in the sense that you could extract the data and directly reuse it.

I want to help, but I'm not sure where to start!

You can read more about the 19 categories of data that we are focusing on in the About section of the site. Each of the entries for each city has been sent in by community members, who have simply used Google or other search engines to find out what datasets are available (simply finding the URL) and under which circumstances (are the data openly licensed, can they be downloaded in bulk, etc.), and then made a submission via the form on the site, where they simply filled out a handful of questions and put in the URL for the data. All in all, it is a really easy (and fun) task that helps to put a city on the open data map— and it's easy to get started!

You can do some research yourself! Pick a city where the Census shows there is data missing or where there are comments showing that there's uncertainty (perhaps the licence hasn't been specified, for example), or pick a city that you know well.
A targeted search or working with others is most fun and helpful. Get together with friends, colleagues, your local open data community, your local Code for America brigade, or your Open Knowledge local group and dig into data on a given topic or for a given city together.

Understanding the US City Open Data Census results

How does the scoring system work?

The US City Open Data Census measures the openness of 19 datasets for each city. The overall score for a dataset is based on the response to specific questions with varying weightings (the weighting for each question is listed in the question table above). The overall city score is then calculated from the score on each dataset.

The score algorithm is:

If an answer is "yes" to a question, add the weighted value to the score for that dataset.

Add up the weighted values for each "yes" answer for a dataset to get the total score for each dataset.

Add up the total scores for each dataset to get a city score.

One of the aims of the questions for each dataset is to provide an increasing set of requirements leading up to full openness. It should be noted that this does not mean each question directly builds on the previous one, since some questions are parallel (for instance, the digital form and publicly available questions). In general, though, there is a progression in the questions, so a ”no” on an earlier question may well imply ”no” on a later question.

If you are intrigued by Open Data...

Learn more

If the US City Open Data Census has caught your interest, there's lots more open data and open government to learn about.