10 Bits: the Data News Hotlist

This week’s list of data news highlights covers February 6 – 12, 2016 and includes articles about a smartphone app that uses machine learning to track air pollution in photos and details about big data projects in President Obama’s proposed 2017 budget.

A group of 31 research institutes, scientific journals, and philanthropic organizations have signed an agreement to share data related to the Zika virus and future public health crises. The group, which includes the New England Journal of Medicine, the U.S. National Institutes of Health, and the Chinese Academy of Sciences, have committed to share this data as openly as possible, such as by making all relevant data freely accessible. Funding signatories, such as the Bill and Melinda Gates Foundation, have agreed to ensure their research funding stipulates that researchers must include data sharing mechanisms in their work so that they can disseminate their findings as quickly and as widely as possible.

Researchers at the Nanyang Technological University in Singapore are developing AirTick, a smartphone app that mines crowdsourced smartphone photos to predict air pollution levels around the world. AirTick will use machine learning algorithms to identify the air quality in users’ smartphone photos and combine this information with the metadata from photos to identify when and where photos were taken. In a 100-person pilot in November 2015, AirTick identified air quality levels in photos with 90 percent accuracy. By increasing the number of users after AirTick’s launch later this year, the researchers expect to be able to provide air pollution for specific locations in real time and warn users about hazardous levels of pollution in their area.

The U.S. National Highway Traffic Safety Administration has agreed to treat the artificial intelligence system that operates Google’s self-driving cars as a driver, removing the need for the cars to have human drivers. Treating an artificial intelligence program as a driver for legal purposes allows Google and other companies to develop self-driving car systems that do not require human input, such as systems without steering wheels or brake pedals, and test them in real-world environments.

Pittsburgh’s Carnegie Museum of Art (CMOA) is implementing Art Tracks, an open data initiative to make information about its artwork more accessible. Art Tracks will convert CMOA’s information on provenance—records of artwork’s origins and ownership—into publicly available machine-readable data, and has already developed a suite of open source tools, available on GitHub, for users to work with and contribute to this data. Any museum will be free to adopt its standards and software tools for their own record-keeping systems.

Researchers at Carnegie Mellon University have developed a machine learning system capable of predicting the outcome of certain types of drug testing without the need for large amounts of time-consuming experiments. The system automatically detects particular patterns of proteins in cells with the help of a computerized microscope and analyzes how different drugs interact with these proteins. After observing enough of these interactions, the system can eventually predict how a drug will interact with similar arrangements of proteins without the need to carry out the experiment. Researchers tested their system by having it analyze the effects of 96 different drugs on 96 different cells, which a human researcher would need to conduct 9,216 tests to analyze completely. After just 2,697 experiments, the system was able to predict the outcome of every pairing with an accuracy rate of 92 percent. The researchers expect this and similar approaches could dramatically reduce the amount of time and resources required to develop new drugs.

The Open Banking Working Group, a financial industry-led working group convened by the UK government in 2015, has published its recommendations for banks to adopt the Open Banking Standard, a framework for improving the utility of bank data for individuals and businesses. Under the framework, banks would make their data on financial products and services available as open data, which would support the development of consumer welfare-enhancing services such as price comparison tools, and adopt an open application programming interface (API) standard to improve how they share transaction data with their customers and each other.

The U.S. Department of Transportation (DOT) has received applications from 77 cities to participate in its Smart City Challenge, which will award $40 million to a medium-sized city that creates the best plan to implement connected technologies to improve public safety, transportation, and protect the environment. DOT will award $100,000 to five finalists on March 12, 2016 to further develop their proposals, and it will select the winning city in June 2016.

Researchers have created Autoscope, an automated microscope that uses an artificial intelligence algorithm to rapidly detect the presence of malaria parasites in a sample. By analyzing the visual features of objects in a sample, Autoscope can determine if the malaria parasite is present with a 90 percent accuracy—a rate lower than well-trained humans, but higher than other types of rapid diagnostic testing. Additionally, other types of rapid diagnostic testing can only detect the presence of the parasite, whereas Autoscope can determine the severity of a malaria infection by quantifying the amount of parasites present.

The UK’s Met Office, the national weather service, has announced it will rebuild its Weather Observation Website (WOW) to allow members of the public to contribute their own environmental data to improve weather forecasting, as well as to make it easier for the public to use the agency’s weather data. The Met Office had initially launched WOW in 2011, but struggled to develop user-friendly ways of helping members of the public extract data for their own use. The new version of WOW, which will launch in April 2016, is also designed to incorporate data collected by businesses and smart city projects that deploy environmental sensor networks.

President Obama’s proposed $4.1 trillion fiscal year 2017 budget includes substantial investments in using data to solve social and economic challenges. The budget allocates $500 million to create a Workforce Data Science and Innovation Fund to improve the quality of workforce data and to create open source analytics tools, including a scorecard to provide outcome metrics for federally funded worker training programs. The Department of Transportation would also receive $3.9 billion over 10 years to support the development and testing of autonomous vehicles. The budget also provides $309 million to support the Precision Medicine Initiative, which aims to develop personalized medical treatments based on genetic data.