Search form

Main menu

More Data Than You'll Know What To Do With | MIT Center for Civic Media

Matt Stempeck

Research Assistant

Matt's a Research Assistant at the Center. He has spent his career at the intersection of technology and social change. He graduated with high honors from the University of Maryland College Park, where he wrote a thesis on the disruptive role of political blogs in journalism. He went on to join the strategy team at EchoDitto, a boutique consulting firm building cool technology for nonprofits, startups, and socially responsible businesses.

Then Matt attempted to save democracy by directing new media at Americans for Campaign Reform, a bi-partisan grassroots effort to enact voluntary public financing of federal campaigns. Right before Citizens United v. FEC hit, he joined the New Organizing Institute, where he helped to train the next generation of organizers. For most of this time, he also ran one of the most popular NetSquared groups in the world.

Matt's interested in pretty much everything, particularly the everything taking place at the Media Lab.

More Data Than You'll Know What To Do With

I've truly drunk from the MIT firehose this week. They say it's not possible, but I think I actually managed to consume an unhealthy amount of information this week. Fortunately I had a strong Clover food truck coffee Friday morning, because the Introduction to Numeric Data Resources session at Harvard was an incredible introduction to the seemingly endless amount of data available on the internet. Fortunately, Data Reference Librarian Diane Sredi was there to inform us on what makes each collection interesting and/or useful.

She provided an overview of data resources and which places to start when looking. Common challenges include the fact that data's spread out everywhere and you can't look in just one place, but the flip side of that is that there's more data available than ever, and Harvard and other schools and libraries subscribe to a lot of these resources.

US Census Bureau
Has demographic information as well as business and industry surveys, mapping information, geneaology. Click the subject list to see an index.

The annual Statistical Abstract in the National Data Book, published since 1878, is a great starting point that aggregates government data, except that the government has decided to stop funding it. They have annual PDFs of the Abstract going back to 1878.

If you open government data Excel files, you'll be able to see a link to the source of the data.

They also list state data here, as well.

American Fact Finder
People are constantly changing data interfaces to make them better; sometimes they succeed, sometimes they don't.
This site has economic surveys, population data (including a population clock counting the number of US citizens, currently at over 312,000,000).

FedStats.gov
Wide range of topics linked to agencies that cover that subject area and a drop down on the right of Agencies by subject with summaries of what the agency covers and links to contact information, which is helpful because the Agencies are generally very willing to help you.

Data.gov
My friend Carroll pointed out that Data.gov and its 390,000 datasets didn't make the list, somehow. From their homepage:

The Lamont Library at Harvard also has access to a lot of foreign statistic services.

World Bank
The World Bank is a great example of a previously paid resource that is now free. It's arranged by topic, country, indicator, and data catalog. There's also a new link to Microdata, which lists surveys from different countries. In the Data Catalog, World Development Indicators is their main database covering over 200 countries. You can look either alphabetically or in groupings, like region and income level.

They have over 1200 variables covering a wide range of topics from education to environment to poverty and the public sector.

Some of the information goes back to 1960, but not for all of the variables or all of the countries.

You can export, view, or format the layout of the report to your liking. Click the little 'i' icon to get a quick look at what the source includes without having to download the complete dataset.

They have lots of economic and business data, a Knowledge Economy Index, and lots more. If you're doing any sort of international research you should start at the World Bank.

Social Science Data Services - great for "finding, understanding, and managing statistics or numeric or tabular data in the social sciences, and management."

MIT affiliates also have access to many of the resources here:

Harvard-Accessible Resources
Start at the library portal, click Find E-Resources in the Articles & More Section. It defaults to the Title page, but if you're not sure what you're looking for, click the Keyword or Subject tabs. Under the Subject tab there's a list of categories, because librarians love to categorize things. If you click a topic you're interested in you'll see a breakout of the resources listed by type. You'lre looking for Statistics and Data, Indexes to journal articles, and Research guides.
Click on one and click Go. It's not an exhaustive list, but a great place to start. Click on the 'i' icon for an expanded description including the dates it covers, how often it's updated, and any access restrictions.

Economist Intelligence Unit
A great place to start for economic data about a country. Type in your country or search by specific reports. If you just want data, click the link for Data Tool underneath the Reports dropdown on the right of the homepage.

Includes CityData, with varying but interesting data from cities around the world.

CountryData is the broadbased option and covers a lot of economic information (key indicators, exchange rates, etc.) back to 1980 and even forecasts up to 2030.

ProQuest Statistical Insight
AKA Lexis Nexis Statistical
Has US and international information You can search for something broad like 'employment' and then use the interface on the left to drill down by source, file format, region, date published, date covered, and targeted sbuject area (like Women's Employment).
Their tool lets you forecast up to 2089, with a significant number of results projecting that far into the future. How accurate these end up being is up for debate. More resources are starting to use drilldown tree graph interfaces to help you begin to target what you're looking for.

OECD iLibrary
Access books, papers, and a dedicated Statistics section. Lots of energy and economic databases, etc. You can search by theme. OECD.Stat tool allows you to compare data across multiple datasets. Their Country tables include key statistics. You can do pivot tables on this data. Click the 'i' button as usual for some nice metadata.

ICPSR
Great archive housed at Michigan since 1962. One of the oldest and largest social science archives in the world. Government surveys, researchers' data, polls, panel studies. Broad range of topics from anthropolopogy, communications, health, and political science.

Click the Find & Analyze Data tab. If you know what you want, the Search is great. If you'd like to browse, you can search for keywords or scroll down to the Browse By Topic section. You can also browse by geography, either by world map or a country index with the number of studies for each nation listed.

The ICPSR Thesaurus is a nice tool if you aren't getting the results you want with the search terms you're using. Try their terms listed here and you'll get much better results, and be able to control for how broad or narrow you'd like to go.

IQSS Dataverse
Largest collection of social science research data, housed at Harvard. You and other researchers can create their own dataverse, backed up by Harvard but owned by you. If you want access to someone's dataverse and they're not allowing it, it's sometimes helpful to get in touch with that researcher.

You can filter dataverses by type of organization, like large institution or educational institution. If you're not on campus, go to the Login screen, leave the username and password blank, and select the university affiliate you're with (MIT).

We look at an example dataverse. "Unit of information: children." Creepy.

Dataverses include documentation. You can run data analysis and download as text, R data, S plus, and Stata. You can recode and case-subset right on the website as well as run advanced statistical models (Diane recommends you understand the model before analyzing it).

Unfortunately there's no one-stop shopping, but clearly lots of good places to go look.

Public Opinion Data
The Roper Center for Public Opinion Research
Half a million public opinioni survey questions and has international data but heavily focused on US. You can search by keyword and country but also specific polling sources. The search engine will give you nice graphic summary statistics - if there's an 'X' icon you can download the data. You need to register, which is free if you go through Harvard libraries. They have specific Latin American and Japanese opinion archives (more on Japanese data).

Government Documents Deparment in Lamont Library at Harvard
Lots of US and international census data and other statistical publications. Foreign document specialist is very open to purchasing what you need for your research.