10 Bits: The Data News Hot List

This week’s list of data news highlights covers February 1-7 and includes articles on a Department of Defense initiative to apply machine reading to cancer research and a Twitter program to give researchers access to the social network’s historical data free of charge.

The Defense Advanced Research Projects Agency (DARPA) has announced plans for a data mining program to track advances in cancer research. DARPA issued a solicitation this week for advanced machine reading methods that can extract meaning from enormous troves of research papers. The agency hopes work from the program, which is projected to cost $45 million, will ultimately be applied to help researchers immediately integrate new scientific discoveries across fields.

Johnson & Johnson is partnering with Yale University’s Open Data Access Project (YODA) to give academics access to its clinical trial data. In addition, the company hopes to open up medical device and consumer product information to the research project. YODA leaders hope the company’s commitment will prompt other drugmakers to release their data.

Music executive Lyor Cohen announced a partnership between Twitter and his company 300, to use data from the social network to dowse for promising music acts. The partnership gives 300 full access to Twitter’s music-related data, including non-public information such as location tags. In exchange, the company will help Twitter develop software that artists and record labels can use to glean insights about their own online buzz.

The U.S. Department of Transportation announced this week that it wants vehicles to be able to communicate with one another wirelessly, in hopes of reducing collisions on the road. The agency released a proposal this week to require all car manufacturers to install vehicle-to-vehicle communications systems in cars and other light vehicles, which would allow cars to rapidly broadcast speed, location and direction data to other vehicles in their immediate surroundings. Vehicle-to-vehicle communications could prevent up to 80 percent of accidents that do not involve drunk drivers or mechanical failure, the agency estimates.

Since January 1, the Chinese government has required 15,000 factories, including large state-run facilities, to publicly report real-time details on air emissions and water discharges. The move came as a surprise to environmental groups, given the nation’s past trepidation toward open data. With air pollution in China’s largest cities reaching unhealthy levels, however, political leaders may be hoping transparency will help spur reform among the worst polluters.

Soon, hundreds of Walgreens clinics will be equipped with software designed to help clinicians through the checkup process, recommending certain tests or requiring that certain questions be asked given a patient’s medical history. The software, called ePASS, uses predictive algorithms derived from data on over 100 million past patients to infer what conditions a new patient might have.

A new rule announced this week by the Department of Health and Human Services will require clinical labs to give patients access to their own results upon request, without needing to go through the physician who ordered the tests. Patient advocacy groups hope the new rule will help patients be more active in managing their own health care and tracking their progress on health-related goals.

Twitter has launched an initiative to provide selected research institutions with access to all its public and historical data, a trove that would normally be highly costly. The initiative, called Twitter Data Grants, is expected to help social scientists, epidemiologists, and others study social phenomena on a larger scale than has been traditionally possible, and thereby develop models that better reflect reality.

The World Bank has launched a database of Indonesian economic indicators at the sub-national level. With data on around 200 economic indicators going back 20-30 years, the Indonesia Data for Policy and Economics Research database includes province- and district-level data and will be accessible to development researchers and the public from everywhere in the world.

A report released from the Government Accountability Office (GAO) strongly critiques the federal government’s property database, arguing that different agencies disparate approaches to defining and inventorying federal property have rendered the aggregate data useless. Federal agencies reported in 2012 that they operate over 480,000 federally-owned structures, but methodologies for defining what counts as a structure differed among agencies, making cross-agency analysis of the properties extremely difficult.

Travis Korte is a research analyst at the Center for Data Innovation specializing in data science applications and open data. He has a background in journalism, computer science and statistics. Prior to joining the Center for Data Innovation, he launched the Science vertical of The Huffington Post and served as its Associate Editor, covering a wide range of science and technology topics. He has worked on data science projects with HuffPost and other organizations. Before this, he graduated with highest honors from the University of California, Berkeley, having studied critical theory and completed coursework in computer science and economics. His research interests are in computational social science and using data to engage with complex social systems. You can follow him on Twitter @traviskorte.