10 Bits: The Data News Hot List

This week’s list of data news highlights covers August 23-29 and includes articles about Indiana’s efforts to use data to reduce infant mortality and a machine learning system that generates knowledge automatically from large internet data sources.

Indiana has a higher-than-average infant mortality rate, and the governor’s office has launched an analytics project with SAP to use data to help fix it. Part of the effort involves simply joining data sets from different agency sources, such as pairing cause of death data from the Department of Health with socioeconomic information from the Family and Social Services Administration. But the state also hopes to employ more sophisticated predictive modeling to identify the factors that both contribute to infant mortality and might be relatively inexpensive to change.

Software company Magpi makes a data collection app that survey-takers can use on mobile phones equipped solely with SMS messaging. The app, also called Magpi, allows public health researchers to collect data digitally using technologies that are widely available even in developing countries. Although the app was designed primarily for public health and international development applications, the company notes it is being used in a wide variety of fields, including safety data collection on Australian natural gas drilling rigs.

IBM announced this week that it would make Watson Discovery Advisor, a research acceleration product based on the company’s artificial intelligence technology, available as a cloud service. The company also released research, published with Baylor College of Medicine, detailing several applications of the service for cancer research. The researchers used Watson to comb through 70,000 scientific articles related to a protein linked to many cancers and were able to identify six promising targets, or molecules that modify the protein and could be used in future treatments. While researchers typically find around one such target per year, Watson identified its six in only a few weeks. The researchers hope Watson will help accelerate research in a variety of biomedical and pharmaceutical subfields.

The Massachusetts Institute of Technology and Marriott Hotels are working together on a social matchmaking system for connecting hotel guests with similar interests. Guests can place their mobile devices on an RFID-connected table, which pulls data from the guests’ LinkedIn accounts to identify people with professional ties or interests in common. Some of this information gets displayed on a nearby visualization panel, where guests can see general facts about guest demographics and interests. The setup is being tested at the Marriott in Cambridge, MA.

The Houston Astros baseball team is in the middle of an ambitious experiment to remake their organization, in-game strategy, and minor league system using data. Leading the charge is general manager Jeff Luhnow, a former quantitative advisor for the Saint Louis Cardinals who reshaped that team’s recruiting strategy. The data-driven investments have come at the expense of hiring star players in the short term, leaving the Astros with an underperforming team, but Luhnow expects the data to begin paying off over the next couple of years.

The International Center for Missing and Exploited Children has developed Project Vic to help child safety investigators automatically identify previously unknown images of child pornography to focus their investigations on new victims. Of the nation’s 62 task forces on Internet crimes against children, 50 are now participating in the project. Third-party developers have also create Autopsy, an open source digital forensics platform that helps speed up Project Vic analysis by prioritizing images that are most likely to be new.

Heat Seek NYC, a volunteer organization building connected temperature sensors, is using the Internet of Things to help make sure all New Yorkers have heat in the winter. The group hopes its sensors, which met their funding goals in a Kickstarter campaign this week, can help provide evidence for heating code abuse claims and help landlords heat their buildings more efficiently. In their first large-scale pilot, the group wants to place their sensing systems in 1,000 New York apartment buildings.

The Robo Brain project, a collaboration between several top U.S. universities, seeks to automatically derive knowledge from massive Internet data sources, including about one billion images, 120,000 YouTube videos, and 100 million how-to documents and manuals. So far, Robo Brain has learned how to recognize chairs, and how to use drinking fountain buttons to dispense water, among other knowledge. The researchers hope Robo Brain will be able to share its insights with other robots, such as household devices, to function more smoothly in human interactions.

After the global financial crisis began in 2008, fast food companies began adopting a data-driven approach to opening new locations. At a recent conference, representatives from several fast food companies discussed using geographical information systems (GIS) to identify and prioritize real estate purchases where future demand growth might be high. Starbucks Coffee, Chick-fil-A, and Wendy’s weighed in, detailing their efforts modeling foot traffic, nearby construction, and other factors that affect new stores’ success. Combined with existing data on consumer demographics, auto traffic, safety information, and other information, these modeling efforts can help fast food companies identify sure bets in a time of financial uncertainty.

The National Nuclear Security Administration conducted its first nationwide test of a wireless, cloud-based radiation data collection system called RadResponder earlier this year. During the drill, over 200 state and local emergency responders from 38 states collected 21,835 radiation measurements, along with field samples and recorded observations. According to the agency, the system was built in response to the 2011 Fukushima reactor disaster in Japan, when government officials from different jurisdictions had trouble coordinating efforts and sharing data rapidly.