Microsoft official looks at changing data trends

Executive analyzes a new way to gather and arrange data

Tony Hey explains that the fourth paradigm is increasingly important due to the speed at which information is processed.

According to some researchers, simply performing experiments to collect data is no longer enough in today’s scientific world. But neither are simulating theories and models on computers, they say.

In response, researchers are shifting toward focusing on a fourth paradigm — the acquisition of managing, organizing, sharing and archiving skills for dealing with a great influx of data.

To illustrate the challenges researchers will face using this new model, the Rutgers Discovery Informatics Institute invited Tony Hey, the vice president of Microsoft Research Connection, to speak at the Computing Research & Education Building on Busch campus yesterday about the rising phenomenon.

Dr. Manish Parashar, director of the institute, said Hey’s professional experiences merited him an invitation to the University.

Tony Hey, vice president of Microsoft Research Connection, speaks yesterday at the Computing Research & Education building on busch campus on the importance of database management.

“Hey highlights the importance of data in every aspect of science, engineering and society as a whole … It would be a great opportunity for us to learn from his experiences and insights as a whole, “ Parashar said.

Hey said he originally worked as a physicist but became fascinated by the world of computer science. He became interested in the ways in which data was deluged and organized.

His colleague, Jim Grey, winner of the Turing award, coined the term “fourth paradigm” based on the idea of the third paradigm in reference to the age of computers, where students needed to learn how to input data and use algorithms.

But researchers no longer focus on gathering data. In fact, they have too much. The fourth paradigm explains the new set of skills needed to mine, visualize and archive the constant data, Hey said.

The fourth paradigm is important because it is multidisciplinary and benefits many different fields, such as bioinformatics and environmental informatics, he said.

“Traditional statistics says that we have to repeat things many times to get a concrete answer, but that’s not always possible … now we use data to update our beliefs,” Hey said.

Hey said he believes that a cycle of levels will help researchers understand and learn the fourth paradigm: acquisition, collaboration, analysis, sharing and archiving.

Acquisition is the process of obtaining the data that is necessary, Hey said. He also strongly believes in the idea of collaboration — that data should be universally available so anybody can analyze it and make conclusions.

“I remember having to go to meetings to analyze the data where we would go, and not much would be accomplished … scholarly enterprise is extremely inefficient,” he said. “We spend a large amount of time reinventing the wheel. We need to step away from that and move forward.”

Universal collaboration allows many different research disciplines that have different interests and uses for certain information to save time and money, he said. This approach also suggests that data should be published before a research group or corporation analyzes it.

Hey said sharing the data through the international community will create interactive storytelling.

“One piece of data means many things and is very multidisciplinary. You can look at something and tell a story based upon it, and the story can be different to anybody,” he said.

The fourth paradigm also asserts that the key to analyzing data is picking out the common data points that are necessary to the hypothesis or goal — a term Hey calls “data mining.”

After the key pieces of data are chosen, the last step is to archive and preserve them. Regulations such as the National Science Foundation Data Sharing Policy already mandate that researchers and investigators archive their data publically, whether it be through a cloud or online, so anybody can access it, he said.

Hey said the fourth paradigm is increasingly important today due to the speed at which information is processed and collected.

“There’s some applications of real relevance, and it differs from region to region, especially in New Jersey where there’s lots of opportunities — so having universities, especially the state university, training people in the skills to do that is really what New Jersey needs,” he said.

Chris Galanis, a graduate student, said he could apply this lecture to his Biotechnical Genomics course.

“We were talking about how research is changing and how nowadays we need a lot of energy to store things. It has some real-life applications,” he said.

Hey said creating classes in data science and analyzing will not be easy, but he believes that it is necessary.

“It’s a tough call, but it’s an imperative for the University and for the state. I think it’s exciting, but then again I would, wouldn’t I?”