CUSP’s research and educational programs are centered on urban informatics, - “the acquisition, integration, and analysis of data to understand and improve urban systems and quality of life.” Big Cities + Big Data and Bringing Urban Data to Life are prominently displayed in its website. This coming September, it will start offering two new programs in Applied Urban Science and Informatics, a 30-credit Master of Science, and a 12-credit Advanced Certificate.

Given the central role of big data with CUSP and with other initiatives I’m involved with, I’d like to step back and reflect on what this all means.

Wikipedia defines big data as “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” Over the past several years we have seen the application of big data to a variety of initiatives in business, government and academia. But, a number of experts have been questioning whether it is big data per se that we should be excited about, or the emerging data science disciplines, like urban informatics in CUSP’s case, which aim to leverage big data to discover new insights, enhance decision making, improve the effectiveness of people-oriented processes and optimize the overall management of social organizations like cities, companies and economies.

“I like everything about the big data concept except for the term itself. The concept is revolutionary and holds transformational possibilities for almost every business. The term itself, however, is a bad one, for a variety of reasons. . . because the term is so imprecise, organizations need to deconstruct it a bit in order to refine their strategies and signal to stakeholders what they are really interested in doing with these new types of data. . .I’m convinced that deriving real value from this messily-named resource derives from going several levels deeper into it.”

A similar sentiment was expressed about a year ago by technologist and journalist Dan Woods in Forbes:

“The idea that big data is going to change business is gradually taking hold in the ranks of senior management, and budget dollars are starting to flow. But there is a massive mismatch between how money is being spent and the support needed for activities that will create business value from data. The problem is an enduring fetish to store big data without making plans for how it will be used.”

“More data is arriving in all sorts of forms and it is clear that some of it, if properly analyzed, can have a huge impact. . . When used properly big data analysis leads to a deeper understanding of both the customer and the business processes used to serve them. Making customers happier and optimizing the processes used to serve them are where the biggest gains come from.”

In other words, institutions need to balance their focus on big data with increased investments on the data scientist needed to create value for the institution by putting the data to work.

According to Woods, data scientists should have a hybrid set of skills: the IT skills that are necessary to deal with and analyze vast amounts of data; and the subject matter skills needed to know which valuable business insights can be extracted from the data, and how to best frame the questions and build the right model that will reveal these insights.

Similarly, Davenport writes: “making big data work requires very smart people with some unusual and impressive backgrounds. I was initially somewhat skeptical of the data scientist label, but after speaking with a number of them, I see that it’s warranted. Before becoming data scientists, most of these people were scientists. . . [But,] they are not typical scientists, however, but rather hybrids of science and computation. Somewhere along their career journeys they became interested in, and good at, the manipulation of data. In fact, many of them really have ‘computational’ in front of their scientific specialties: computational biology, computational ecology, etc. And I’m not sure about this, but I think ‘applied mathematician’ is a synonym for ‘computational mathematician.’”

When used by itself, as it often is, big data might imply that the emphasis is on the data. But, with data science, it is clear that the emphasis is on the science.

Scientific disciplines seek to develop testable explanations and predictions based on applying scientific methods to their particular areas of research: “To be termed scientific, a method of inquiry must be based on empirical and measurable evidence subject to specific principles of reasoning.”

This has long been the case with mature disciplines like physics, chemistry and biology. Every time a new measuring instrument or technology is developed, - e.g., a new kind of telescope, an advanced microscope, a better technique for genomic sequencing, a more powerful particle accelerator, - lots of new information gets collected that validates theories and predictions and/or leads to new questions, and sometimes, to whole new areas of research.

For the past couple of decades, we have turned our measuring instruments on ourselves. We’ve been using the ubiquitous digital technologies and devices all around us to both create and collect massive amounts of information on who we are, what we do and how we interact as individuals, communities and institutions. And, just like the established disciplines, the emerging data-science-oriented disciplines like urban informatics and information-based medicine aim to leverage all these sources of data to develop new scientific methods of inquiry “based on empirical and measurable evidence subject to specific principles of reasoning.”

This is all in its very early stages. What is data science?, by Mike Loukides of O’reilly Media, is one of the most comprehensive articles I have seen on the subject. “Merely using data isn’t really what we mean by data science,” writes Loukides. “A data application acquires its value from the data itself, and creates more data as a result. It’s not just an application with data; it’s a data product. Data science enables the creation of data products. . . The thread that ties most of these applications together is that data collected from users provides added value. Whether that data is search terms, voice samples, or product reviews, the users are in a feedback loop in which they contribute to the products they use. That’s the beginning of data science.”

Loukides contrasts the holistic approach employed by data scientist with those used in data mining, business analytics and other applications that apply statistical analysis to large data sets. “[Data scientists] are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: ‘here’s a lot of data, what can you make from it?’”

According to experts Loukides interviewed for his article, the best data scientists tend to be physicists and other scientists. “Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you’ve just spent a lot of grant money generating data, you can’t just throw the data out if it isn’t as clean as you’d like. You have to make it tell its story. You need some creativity for when the story the data is telling isn’t what you think it’s telling.”

In addition, a lot of the innovation in physics and other sciences involves knowing how to break large, complex problems into smaller ones, as well as how to attack a large, difficult problem that appears intractable by making the necessary approximations and by finding a more tractable problem whose solution can be related to the larger problem’s solution.

Over the past few centuries, we have significantly increased our understanding of the natural world around us by learning how to collect large amounts of data and by developing disciplined ways to study, analyze, model and make sense of all that data. We have similarly applied our scientific methods in the social sciences to enhance our understanding of societies and human behavior. Given the explosion of new data we can now gather with our ubiquitous digital technologies, let us hope that a whole new set of data-science-oriented disciplines will emerge to help us better understand and deal with our increasingly complex lives and human organizations, - like cities.

Comments

This is a very interesting article and it is related with program evaluation that I am writing for school. Currently, I am researching about higher adult education and low-income students. In particular I am interested in answering the following question: How has financial aid requirements impacted students over the past 5 years?

I understand there are several factors. One of them is salary college tuition prices in some higher education institutions have gone up while salaries have remained the same (Geiger & Heller, 2011). This issue is affecting students' ability to pay back their loans.

I would like to know if anyone could provide ideas for further research on this subject.