Episode 1: Scavenger hunt!

Let’s kick things off with something a bit unusual: a virtual scavenger hunt.

At some point, nearly every web geek gets a chance to hack on some open data, usually from a government source. The buzzword here is “mashup,” but knowing how to find and consume openly available data will remain a valuable skill long after its faddishness ends.

Unfortunately, governments, and especially the US government, are often incredibly awful at providing this data. Sure, it’s available — but you’ve got to find it first.

So this question is all about finding that data. Since I’m most familiar with the USA, this question is USA-specific (but I’d love to see answers to any questions that apply to other nations).

In each case, the answer should be a URL where you can either download the data in question, or at least find a direct link to the data. There may be multiple sources for each, including ones that could be screen-scraped for the data. I’m not looking for those sources, however — just the ones with easily downloadable data in a format that can be easily parsed by a computer (i.e. CSV, XML, plain text). “Friendly” formats, in other words.

So, where can I download data to:

Analyze the nutritional content of foods?

Find the population (and other basic demographics) of my city?

Analyze the latest SEC filings by public companies?

Look at historical gas prices?

Look for trends in juvenile arrest rates?

Post your answers into the comments. For extra brownie points, tell us how you located each piece of data — did The Google serve you well, or were you forced to turn elsewhere?

If you really want to stretch your brain, try to write a tool to import each chunk of data into your favorite relational database. There will be a related question in a couple of weeks involving modeling one of these pieces of data, so you overachievers can start thinking about it now…

I feel so slow now. :( I only just found the Nutritional Content. :P Same URL as you have. There are other sources, however. I took time to read through some things (which slowed me down), and even the USDA get their information from 3+ sources originally.

Whoops, forgot to mention that The Google helped with “latest SEC filings.”

malikyte

I can only have a go at these once every couple hours since I’m at work. As far as the SEC Filings (google “SEC filings”, traversed first hit subcategory…) are concerned, this is what I’ve found thus far:

http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent
Allows a search of most recent filings, about as real-time as you can get. I have not found any way to gather a full listing of this data without scraping. The following links may assist with this, but my 5 min search break is over. :)
– Link 1
– Link 2
I think these are in the right direction, but again, my time’s up.

As an aside, when using a search engine, using advanced filters can help quite a bit when you know what form of information you’re looking for, especially if a government organization is most likely involved. With google, for instance, you can specify in the search terms: site:.gov “sec filings” or site:.org “sec filings” — limiting your search results goes a long way in removing unimportant data.