Turning Big Data into Big Knowledge

By Alex Russell - The major challenge of big data for social scientists today is in figuring out how to turn this wealth of information into knowledge, according to Martin Hilbert, an assistant professor of communication at UC Davis. Hilbert studies information technology, big data and what it means for human societies.

Hilbert spent his early career working for the United Nations Secretariat on digitalization in Latin America. Since then the amount of digital data generated by humans worldwide has exploded. The problem, he says, is that we still lack a theoretical framework to explain the role of information in social evolution.

Here he is discussing big data, information theory and social evolution in the digital age:

What project are you working on right now?

The big project is to hammer out a theory to explain how information converts into economic growth. We say we are in an information age and use theories from the industrial age, in terms of capital and labor, to explain them. We are in the information age and don’t even know how to measure the information capacity we have. You can count computers but this gives you no information on how information drives social evolution.

Theories on communication today come out of information theory and computer science. These theories show how society processes information. We have to study the theories because in the end if we live in an information society, information theory should have something to say about it.

What brought you to this project?

The first 15 years of my career was in digitalization. It was 1999 and the internet was becoming big. I was just a backpacker and hanging out at a university in Chile. I thought that these technologies were a big opportunity for international development, but people were laughing at those ideas. “The people can’t eat computers,” they said.

It became much bigger, and I ended up working for the United Nations for 15 years to promote digitalization in Latin America. We have spent millions of dollars on information technology for development but these projects lacked the fundamental theory of information. I resigned from my permanent appointment at the UN to join UC Davis to have time to think about how information converts to knowledge and how this drives social systems.

What is a surprising recent finding from your work?

It’s possible to show that social evolution is an information process. It can be looked at in formal mathematical terms as a communication channel between the environment and the evolving social system. The more information is communicated over this channel, the more fit will achieved between the social system and its environment. This fit translates into fitness, which represents the growth rate of the social population.

One way to think about it is that it becomes well adapted. It’s an ecological argument. Fitness hinges on the informational fit. It can be shown that in the case of a noiseless communication channel between the social system and the environment, the social population achieves optimal growth. Information and growth potential are equivalent. The good thing is that information theory allows us to quantify information. It’s surprising that theories of communication can explain how societies evolve themselves.

What are some opportunities for interdisciplinary research in the social sciences?

Over the past 20-30 years we have created this vast amount of information to help evolve societies based on this information we have on ourselves. We are now into the second stage of the digital age, which is the process of converting this information to knowledge.

The social sciences are unique, because we used to be the poorest in data among all the sciences. Now we are the most data-complete science. Because of information technology we collectively we have seven billion digital footprints. Their mobile phones tell us where they are, their plastic cards tell us what they consume, and their social networks tell us who they are and with whom they are. This happened over just 15 years and now we are in data overload. We have more data than we can possibly analyze.

Not all data is available for social science research, but there is also plenty of free data. Wikipedia offers a terabyte download. That’s plenty of data to play with for years. Download the terabyte and you’ll see.

So we are the most data-complete science right now but we are still lacking methods to analyze the data. Right now, it’s obligatory for a graduate student to learn how to do small scale surveys and lab experiments, because that’s what social scientists have done all along. But programming is not part of the curriculum. However, if want to collect and analyze big data, you’ll need to program.

We are not prepared for the amount of data we can now access. Collaborations with scientists that have those skills can help us. It’s also important that there’s collaboration so researchers in other fields don’t reinvent the social sciences.

What other UC Davis researchers’ work are you most excited about?

I’m very excited about the opportunities for interdisciplinary collaborations in the field of complex systems. We have faculty who are leading experts in the field, several linked to the Santa Fe Institute. I’m collaborating with Jim Crutchfield in physics on a joint satellite event at the 2015 Conference on Complex Systems.

I’m also collaborating with him and others on a joint hybrid online/offline course on complex social systems. This includes the engineering and computer science Professor Raissa D’Sousa, who presented at the ISS conference recently, and Jeffrey Schank from psychology, who builds agent-based models and computer simulations to understand the behavior of individuals and populations in social and evolutionary settings.

As you can see, I’m also very excited about the opportunities of online education. Our profession will look very different only a few years from now. Having studied digitalization for enough years, I’m very aware of the risks and threats, but if done correctly, it can be the most tangible solution for many of the problems faced by today’s education systems.