10 Bits: The Data News Hot List

This week’s list of data news highlights covers September 20-26 and includes articles about Germany’s new open data action plan and epidemiologists’ efforts to fight the West African Ebola outbreak with predictive models.

A Copenhagen neighborhood is the setting for a major experiment in smart infrastructure called the Danish Outdoor Lighting Lab. The project, which has filled the streets with energy-efficient light-emitting diode (LED) streetlights along with sensor systems and power supplies, hopes to track each of these objects remotely and manage them automatically. For example, it will brighten an area around a pedestrian or bicyclist while leaving the rest of a street darkened. The project’s organizers hope it will save the city energy costs and serve as a proof-of-concept for future data collection applications based on city lighting infrastructure.

The government of Germany released an open data action plan this week, pursuant to a G8 agreement in 2013 to improve open data efforts. The charter called for the member countries—which include Germany along with the United States, the UK, Canada, France, Italy, Japan, and Russia (which was suspended from the partnership in 2014)—to make openness the default standard for government data, implement open standards for publishing data, increase quality and quantity, and release data to drive innovation and improved governance. Germany’s action plan includes a description of how it will make more data sets on its national data portal machine readable and encourage cities and states to conform to the expectations set in the charter.

San Francisco-based health data startup Iodine launched its flagship product this week, a searchable database of drug information that lets users see how other patients feel about the efficacy of certain drugs. The company created its data set by sending out Google Consumer Surveys, of which 100,000 have been completed, and integrating those with information on adverse events from the Food and Drug Administration and other data sources. The data set also includes information on co-pays, alternative drugs, and side effects.

Kaiser Permanente has invested over $4 billion in the last 10 years to build the private sector’s largest database of electronic health records and now the provider has partnered with the University of California, San Francisco (UCSF) to link those records with DNA samples from more than 210,000 of its patients. Kaiser Permanente and UCSF hope the database can help researchers study how genetic factors influence diseases such as glaucoma and prostate cancer. In the future, the database could let patients take a genetic test before doctors prescribed a medicine, in order to predict whether the patient might experience adverse effects. Kaiser Permanente is also granting select researchers from external organizations access to the data.

Australia’s University of Western Sydney is using advanced analytics and data visualization to track student performance and curb course failures. The university will use the analytics to inform support and intervention programs for its students, many of whom come from lower socio-economic backgrounds and are the first in their family to attend a university. Using business analytics software Tableau along with student surveys and course information, the university is prioritizing groups of students in greatest need of intervention.

WhoSampled is a website and app that lets users explore the complex network of samples in popular music. Built on a database of over 270,000 songs along with the samples they incorporate, WhoSampled lets users look up a sample and watch a YouTube or Vimeo clip pinpointing the location of the sample in its original context. WhoSampled, which launched its Android app this week, provides data services to music identification app SoundHound and the company’s founder says he is open to adding WhoSampled capabilities to other apps like Spotify.

With the West African Ebola outbreak worsening every day, health organizations are turning to predictive models to help stop the virus’s spread. A paper from the Centers for Disease Control released this week shows that projecting Ebola cases depends largely on how accurate the current case data is. For example, if there is no significant increase in the impact of interventions, the number of cases could spike to 550,000 by January 2015, but if cases are being underreported the number could be as high as 1.4 million cases. The paper’s authors then used these models to estimate the size of interventions needed to bring the disease under control and found that around 70 percent of patients would have to be quarantined, up from 10 percent at present.

Fashion moves quickly, and retailers who order stock weekly or even daily want to know when trends change as quickly as possible, but traditional industry tracking tools can have an unacceptably long lag-time with their analysis. London-based startup Editd helps solve the problem, providing retail brands with data and analysis on their industry in near-real-time. Drawing from stock movements, product offerings, and other data sources, Editd lets companies compare themselves with competitors and rapidly determine when a particular product is going viral and should be ordered.

Twitter recently open sourced DIMSUM, software that makes it easier to run recommendation algorithms on massive data sets. Recommendation systems, such as those deployed by Netflix, Amazon, and other web companies, are often burdened by the amount of data they have to parse through, attempting to recommend the perfect product among millions of possibilities to a particular user. DIMSUM eases recommendation systems’ burdens by efficiently pre-processing the large amounts of data and returning a subset of potential recommendations that is much smaller than the original data set. DIMSUM is up and running at Twitter, where it matches promoted ads to users and recommends people to follow, but the data scientists who created DIMSUM hope it will be applicable to other companies that have to navigate through large data sets as quickly as possible.

Los Angeles-based startup ConnectX wants to make big data storage and processing more efficient with a space-based supercomputing platform. Space, which offers cheap real estate and low temperatures that can cool electronics without using extra power, could provide an ideal setting for ConnectX’s systems. In addition, the company plans to create processors that do not rely exclusively on traditional binary processing, instead using a novel computing model that the company’s founder hopes will be able to store data more efficiently. ConnectX hopes to launch its first test satellite in about a year.