10 Bits: The Data News Hot List

This week’s list of data news highlights covers December 6-12 and includes articles about how the Internet of Things is transforming agriculture and how mining Twitter posts can provide insights about mental illness.

Retailer Marks & Spencer is relying on predictive analytics to ensure its stores are stocked during the holiday season rush. The company uses the technology to better predict demand and forecast sales based on factors such as weather and changing fashions. This is the latest in a series of data-driven business changes for Marks & Spencer, which has been implementing data visualization and analytics software for its stores since February 2013.

MIT researchers have developed a machine learning model designed to help humans make better sense of the patterns it discovers. Recognizing that humans rely on previous experiences and examples to conceptualize things, the team of researchers set out to create a machine learning model that categorizes data and provides a representative example of the dataset. Called the Bayesian Case Model, this approach not only performs more accurately than other machine learning models, but human testers using this model were able to perform task significantly faster than with other models.

The Obama administration is expected to issue a rule soon to give sensitive data from government agencies a standard classification, possibly by March 2015. Currently, there are about 120 different designations for sensitive government data managed by the government that does not warrant a “classified” status. These categories, ranging from “Law Enforcement Sensitive” to “For Official Use Only”, will be dubbed “Controlled Unclassified Information.” This new rule is expected to reduce costs and the complexity of managing government data.

The Tennessee Economic Council on Women, a state agency devoted to addressing the economic needs of women, has created a new online database to connect Tennesseans with information about sexual assault, domestic violence, and human trafficking resources. The Council created the database to provide valuable information to women and children in need, as well as to act as a tracking system for the availability of helpful services in each county, highlight weak areas where services providers can focus expansion efforts, and give policymakers an accurate assessment about the state of abuse victim care in Tennessee.

Belgium-based startup Smappee launched its device of the same name to help people monitor how much energy their electronic devices are using. Smappee detects the electronic signals created by all electronic devices, from a refrigerator to a lightbulb, and displays how much power each device is consuming on a corresponding phone app. The app can send alerts if energy use for certain devices is abnormally high or if devices are on and drawing power when they should not be. Smappee’s founders hopes the energy data collected will help encourage people to be more energy conscious by providing them with data they would have had a hard time accessing otherwise.

The agriculture industry is turning to the Internet of Things to increase efficiency and sustainability. Agricultural companies are using networks of sensors to monitor and analyze soil, climate, and weather data to predict crop yields and create planting plans. Farmers can visualize their fields on a smartphone and develop a plan tailored specifically to their land’s unique characteristics. The International Food Policy Research Institute predicts that these technologies could increase crop yields by up to 67 percent, cut food prices in half, and reduce food scarcity by as much as 36 percent.

New research suggests that analyzing Twitter posts can be a cheap and timely source of health information and provide insight into mental illness. Using the same approach that was used to predict flu outbreaks from Twitter data, a group of Johns Hopkins researchers are using this data to gain insights into some common mental illnesses. The technique involves analyzing tweets from users that have publicly indicated their diagnosis to discover language-based cues linked to certain disorders, such as post-traumatic stress disorder, depression, and bipolar disorder. The researchers hope these insights can be tapped by treatment providers and public health officials to better understand and treat certain mental illnesses.

In an effort to modernize government services, India is building a database to consolidate public records for its 1.2 billion citizens. The database will house things like police and court records, as well as serve as a platform for property registration and social assistance systems. The database is being developed as the Indian government rolls out its “Digital India” project to consolidate government services online. The developers of the database cite the need for a modern, centralized system to replace the patchwork of decade-old, unintegrated systems.

Thanks to grants from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program, a group of universities are collaborating to build a database of over 1.7 million specimens of plant and fish life. The universities will be populating the database with data from specimens, some of which date back 200 years, such as genetic information and geolocation data. This database will provide researchers with valuable access to a wider pool of samples for study and comparison and could even help fight the spread of invasive species.

A new study suggests that images on social media websites could aid in conservation efforts. Data tied to images that show when and where the photos were taken could be used to better manage certain ecosystems by identifying, says the study’s author from the National University of Singapore. For example, by analyzing how the subjects of the photos were interacting with their environment, environmental stewards could gain better insights on how to protect certain areas and how to disseminate more information about the ecosystem.