There exists a shortage of usable data sets and public health data. Whether your interest is biomedical engineering, health informatics, data mining, or public health analysis, this annotated bibliography should contain something that will aid your search for knowledge. It is my pleasure to compile this resource for you, and I hope that you find it as useful as I have during my work as a health informaticist and data scientist. Thank you for using this research in your work, and I wish you the best on your data endeavors.

For this first edition, this bibliography is compiled alphabetically. As things progress and this work grows, it can be certain that a different shape will emerge. At the same time, the basic concept still holds true: keep it simple. If you are looking for a database, data set, visualization tool, or government health data fact, you can probably find it within one of these data sets. Please feel free to write me at http://velluminformation.com with any specific data requests or questions, and I will be happy to aid you if possible.

“The CMS Center for Strategic Planning produces an annual CMS Data Compendium to provide key statistics about CMS programs and national health care expenditures. The CMS Data Compendium contains historic, current, and projected data on Medicare enrollment and Medicaid recipients, expenditures, and utilization. Data pertaining to budget, administrative and operating costs, individual income, financing, and health care providers and suppliers are also included. National health expenditure data not specific to the Medicare or Medicaid programs is also included making the CMS Data Compendium one of the most comprehensive sources of information available on U.S. health care finance. This CMS report is published annually in electronic form and is available for each year from 2002 through present.”

This report contains statistical data for the Urban Indian Health Institute’s research: topics include sociodemographics, mortality, access to care, alcohol use, and environmental, heart, mental, and maternal/child health. Compiled from the national service areas located within the USA.

Includes data tools and data sets: for example, Fiscal data for public schools and universities, common data core sets, educational progress and primary/postsecondary data. Data sets include legal data, Federal resources, and trends in science and mathematics for students. Data sets are in a variety of formats, XML, CSV, and XLS.

“You’ve found a public resource designed to bring together high-value datasets, tools, and applications using data about health and health care to support your need for better knowledge and to help you to solve problems. These datasets and tools have been gathered from agencies across the Federal government with the goal of improving health for all Americans. Check back frequently because the site will be updated as more datasets and tools become available”

Key elements include a massive index of health data sets: Medicare, geographic data, medical record system adoption, child welfare, and assisted reproduction data. There is a health apps repository/demo site, and a small collection of other data sources that bears looking at, especially for 1. California’s health data, and 2. The Gallup Poll Well-Being Index.

FastStats has data for any illness or major life complication that could arise for a citizen of the USA. A small sample includes: American Indian or Alaskan Native health, assault/homicide, cancer, deaths/mortality, emergency department visits, immunizations, kidney disease, life expectancy, marriage, Mexican American health, obesity/overweight, pertussis, smoking, and teen pregnancy. If it is a life-changing event, chances are good that FastStats has at least basic data for it.

“The IT Dashboard is a website enabling federal agencies, industry, the general public and other stakeholders to view details of federal information technology investments. The purpose of the Dashboard is to provide information on the effectiveness of government IT programs and to support decisions regarding the investment and management of resources. The Dashboard is now being used by the Administration and Congress to make budget and policy decisions.

Importantly, there are analysis tools and data feeds, not quite a data set. Also, the source code is available for the IT Dashboard.

The Healthy People 2020 Initiative is dedicated to creating a health environment for everyone, and contains data and publications that strive to meet this goal. It has a specific focus on health disparities and prevention efforts.

”Publishing high-value datasets that increase accountability and responsiveness improve public knowledge of the Department of Justice and our operations, create economic opportunity, and respond to need and demands of the public are a core component of our efforts to fulfill The Open Government Directive”

Donated data sets, combined with an information visualization application, creates real-time displays from an almost endless supply of data. Everything from average Canadian household expenses, to London’s air quality, to Kobe Bryant’s game scoring, and quite a bit in between. Also, the application is relatively simple to use, which means that any given data set can be visualized with little effort.

A huge repository of open data sets from the state of Massachusetts: economic, education, geography, health, population, public safety, and technology are all covered, as well as quite a few other subjects.

“Welcome to the National Center for Health Statistics’ website, a rich source of information about America’s health. As the Nation’s principal health statistics agency, we compile statistical information to guide actions and policies to improve the health of our people. We are a unique public resource for health information – a critical element of public health and health policy.”Data covers: diseases, health care and coverage, injuries, life stages, populations, lifestyle factors, and more.

Open data sets for everything from subway data to open-access WiFi networks, park maps, SAT scores, and filming locations. Too much of a hodge-podge of data sets to really define – besides the key element that everything is related to New York, there is no strict boundary or catalog.

“The Open Data Initiative is a Web 2.0 site for disseminating public data.”Includes visualize data sets for suburb safety, Australian criminology tracking, and the Saudi Arabian census. May bear further watching, or may be transitory.

“The Open Government Data Initiative (OGDI) is an initiative led by Microsoft Public Sector Developer Evangelism team. OGDI uses the Windows Azure Platform to make it easier to publish and use a wide variety of public data from government agencies. OGDI is also a free, open source ‘starter kit’ with code that can be used to publish data on the Internet in a Web-friendly format with easy-to-use, open API’s. OGDI-based web API’s can be accessed from a variety of client technologies such as Silverlight, Flash, JavaScript, PHP, Python, Ruby, mapping web sites, etc.”

Hosted by Microsoft’s Cloud App servers, this data initiative displays visualized data sets and has a section for data developers as well.

This is the motherload of all data banks. Provides access to over 7,000 indicators for global statistics, including economic, health, education, and environmental; by country, year, and topic. Also has a microdata library.