4 Big Data Challenges that Universities Face

University presidents grapple with how to advance research in an era where big data and big science place increasing demands on networks.

The rise of big data and big science have provided universities with an opportunity to work together on ways to technologically manage worldwide research projects.

Misfolding proteins that cause Alzheimer's disease and the mysteries of dark matter and dark energy are just a few of the projects that generate massive amounts of data. To unlock the answers that the data holds, researchers will have to work together across different disciplines and countries.

"In this era of big data and big science, universities must serve as a crossroads for collaboration more than they ever have," said Shirley Ann Jackson, president of Rensselaer Polytechnic Institute, during a general session at the 2014 Internet2 Global Summit in Denver on Tuesday, April 8.

This crossroads for collaboration doesn't just mean that researchers should talk to each other. It also means collaboration between research and IT in a way that doesn't always happen, said Michael McRobbie, president of Indiana University. Researchers need the support of IT leaders who take the time to understand what's needed technologically and can then provide it.

Ten years ago, IT leaders weren't always sure what researchers wanted, so McRobbie decided that his campus would find out. Brad Wheeler, Indiana University's CIO and vice president of IT, pulled together a community of about 15 people and asked what they wanted. Ultimately the group was looking for the ability to store and preserve data. So that's what IT gave them.

"It is absolutely essential to ask and continually ask the researchers what it is that they want," McRobbie said.

As university leaders support their campuses' missions, they face four major challenges on the road to unlocking the potential of big data and science.

1. Volume

The sheer amount of data coming out of big research projects is staggering. The research and education network from Internet2 allows researchers to share large amounts of data, and they're doing so to the tune of nearly 50 petabytes a month. While the Internet2 network is lightning fast, the explosion of data has made it challenging to keep up, Jackson said.

That leads to another challenge: Networks and supercomputers don't have the same capacity to handle these large volumes of data. For example, the Internet2 network provides 100 gigabit Ethernet technology, but servers may only allow applications to use 1 gigabit, according to SURFsara, which supports researchers in the Netherlands.

So while the network may be fast, the applications can't keep up, which is why researchers are sending their data via snail mail on discs, Jackson said. She suggested that cognitive computing systems could help address this problem by collecting and interpreting data for researchers.

2. Velocity

With real-time data and an abundance of information coming at them quickly, researchers must determine how to handle data that's at rest and in motion. One of the questions is whether IT leaders can embed more artificial intelligence inside networks so they can figure out what data to move and how to do it.

3. Variety

Along with volume and velocity, a variety of data from numerous, if not unlimited, sources and geographic locations poses a research challenge. With researchers collaborating around the world, not everyone knows who has the data or the tools to work with it. Internet2 is working on this worldwide collaboration problem by partnering with the National Knowledge Network in India to improve research and education, among other things.

Another way to deal with this problem is through a system like Yellow Pages for data that's powered by a tool such as Watson, a cognitive technology from IBM. At Rensselaer Polytechnic Institute, researchers are teaching Watson to be a data adviser that can help guide them to treasure troves of relevant information in the 1 million open government data sets available around the world.

4. Veracity

By bringing together data from different sources, researchers now have to determine which information to trust and use. Jackson suggested that artificial intelligence can also help in this area. But whatever universities do, they need to shore up their Internet networks so these different connections and sources don't compromise research.

"We are connected by our exposures, and we are exposed by our connections," Jackson said. "Therefore it is of importance that greater resilience be built into our networks, both for the security of Internet of Things, as well as for avoiding disruption of important collaborative research efforts."

The importance of people networks

Ultimately, connections between researchers, university leaders and IT staff around the world prove the most challenging. But they are also the most valuable.

"The most important networks in discovery and innovation are human," Jackson said. "But unlocking human potential depends not only on the technology we put in place, but on how we are able to use them."