Like most newsrooms, we make extensive use of government data — some downloaded from "open data" sites and some obtained through Freedom of Information Act requests. But much of our data comes from our developers spending months scraping and assembling material from web sites and out of Acrobat documents. Some data requires months of labor to clean or requires combining datasets from different sources in a way that's never been done before.

In the Data Store you'll find a growing collection of the data we've used in our reporting. For raw, as-is datasets we receive from government sources, you'll find a free download link that simply requires you agree to a simplified version of our Terms of Use. For datasets that are available as downloads from government websites, we've simply linked to the sites to ensure you can quickly get the most up-to-date data.

For datasets that are the result of significant expenditures of our time and effort, we're charging a reasonable one-time fee: In most cases, it's $200 for journalists and $2,000 for academic researchers. Those wanting to use data commercially should reach out to us to discuss pricing. If you're unsure whether a premium dataset will suit your purposes, you can try a sample first. It's a free download of a small sample of the data and a readme file explaining how to use it.

The datasets contain a wealth of information for researchers and journalists. The premium datasets are cleaned and ready for analysis. They will save you months of work preparing the data. Each one comes with documentation, including a data dictionary, a list of caveats, and details about how we have used the data here at ProPublica.

We've long worked informally with people interested in purchasing our datasets; some of our apps have provided downloads of the data used to build them. We hope that providing a clearinghouse for all of our datasets will help this material reach a broader community and will support, in spirit and financially, our journalistic mission.

The Data Store is a bit of an experiment. We don't know for sure how much interest there is for the data. For now, there are only a few datasets available and it's a manual process to buy them. We'll add more data over time; you can see some of the datasets we'll be releasing in the next few weeks under Coming Soon. We're paying close attention and expect to learn a lot in the first few weeks after launch.

If you have suggestions for datasets we should make available, or features we should add, please don't hesitate to contact us at scott@propublica.org.

Safeguard the public interest

Republish This Story for Free

Thank you for your interest in republishing the story. You are are free republish it so long as you do the following:

You can’t edit our material, except to reflect relative changes in time, location and editorial style. (For example, "yesterday" can be changed to "last week," and "Portland, Ore." to "Portland" or "here.")

If you’re republishing online, you have link to us and to include all of the links from our story, as well as our PixelPing tag.

You can’t sell our material separately.

It’s okay to put our stories on pages with ads, but not ads specifically sold against our stories.

You can’t republish our material wholesale, or automatically; you need to select stories to be republished individually.