If a pile of data falls in a forest and nobody sees it, does it really exist? "From my point of view, data do not exist if you cannot see them." Those were Andrew Pandre's words, heavily accented, underscoring data as the plural of datum. Appropriately so, and not just because it's grammatically correct, I should add, since the topic of the morning was "big data" visualization.

By submitting your personal information, you agree to receive emails regarding relevant products and special offers from TechTarget and its partners. You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

Linda Tucci

The Russian émigré -- a data scientist of many years, author of a popular blog and a principal at Sears Holding Cos. -- was among a panel of technologists at a local seminar last week on the value of visualization in understanding big data. Human understanding, to be clear, for the recurring theme of the panel (which included IBM Fellow Irene Greif, director of the IBM Center for Social Business; Martin Leach, CIO of the Broad Institute; and Richard Dale of Big Data Boston Ventures, an early stage investment firm) was that we humans like to look. Data is best apprehended through the visual system, our strongest sense.

If we can't see the data, get a feel for it, if we can't detect some cluster or outlier, big data is not worth talking about, the panel said in so many words. "People have behavioral changes when they have visualizations in front of them. They interact and get drawn in, and start to learn things they wouldn't have learned if they were just told what others found," said Grief, based on her experience with IBM's Many Eyes data visualization tool. Visualization gives people a common language to probe complex ideas. "It creates a place where you can talk and communicate over the data," said Leach, who, with 13 petabytes of spinning disks under his purview at the Broad Institute, is no stranger to the problems posed by big data.

Are there geniuses of change detection? Should companies be looking for them?

Visualizing data gives people a sense of ownership of the data, Pandre said. Bosses especially. Business people will ignore all the mathematical and statistical models unless they see "a freaking pie chart." Seeing is believing. "My role is to make people see the data and sense the data and make them act on what they see."

But how and who can do this when "80% of big data visualization is data, 10% is the story behind it and only a very small part is the actual visualization," according to Pandre. Understanding how to abstract and represent abstraction is probably the biggest challenge in big data visualization, said IBM's Greif, as much art as science. The ability to combine art with science is the key to visualizing data, but those talents rarely reside in one person. The Broad Institute's Leach told the story of a professor of dermatology at Yale who used to send his students to the Peabody Museum on campus and assign them an art work to describe. If they mastered that, they could describe "what the zit looked like."

Most big data visualization is confusing or worse, misleading: "85%, 90% of visualization is bad," Pandre said. And bad on many levels, the panel made clear, from choosing the wrong colors to represent findings (stick to green, red and yellow for good, bad and transitional data) to serious abuses. One practice that horrified Leach in his days working in big pharma was the researchers' habit of leaving off the error bars that show significant statistical differences between the bits of data in order to get their point across. "We need to caution people that visualization should be used for hypothesis formulation," Greif said. "We need to teach people to look at these charts and to know whether they are really learning something or getting a good idea to check on later." People don't necessarily have to be mathematicians, but they need to understand what led to the visualization.

And that becomes harder to do as the data gets bigger. Big data visualization summarizes and aggregates data, hopefully in illustrative ways, panel moderator Dale said, but begs the question of what gets lost in the details. And how much data can a person see in one go, anyway? "The human eye can only recognize a certain number of objects, and it is not millions," Pandre said. Adding motion to a data point -- making it jiggle -- adds another dimension to the data humans can detect. "But I don't think it is highly leveraged" as a tool, he said.

Humans are good at detecting change. One of Pandre's early big data projects was analyzing the Soviet sky, taking a photo of it every second and analyzing the pictures, square by square, for any change in the scene -- a shadow of a cloud, an American plane. The ability to detect change, however, may be on a spectrum.

The Broad Institute's Leach explained that for him, change is visceral: "I have Asperger's to some degree. I also have OCD," he said, using the acronym for obsessive compulsive disorder. "I feel uncomfortable in certain situations when I see certain things," he said: an eagle in a tree by the side of the highway that no one else noticed, for example. The sign in the back of the seminar room was driving him crazy because it was off-kilter. "Certain people can notice changes; something in them makes them sense change differently."

Are there geniuses of change detection? Should companies be looking for them?

What's the expertise most missing at companies hoping to crack "big data?"

2 comments

E-Mail

Username / Password

Password

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy