Gut instinct is out and data-driven decisions are in at smart companies, right? Not so fast. Even when relying on data to drive business decisions, organizations should proceed with caution and skepticism.

Download this free guide

What should be in a CIO’s IT strategic plan?

This complimentary document comprehensively details the elements of a strategic IT plan that are common across the board – from identifying technology gaps and risks to allocating IT resources and capabilities. The SearchCIO.com team has compiled its most effective, most objective, most valued feedback into this single document that’s guaranteed to help you better select, manage, and track IT projects for superior service delivery.

Take such data-based products as LinkedIn's People You May Know recommendation engine or, everyone's daily habit, Google's search engine. "The more a user uses the product, the more data is generated, which can improve the product," she said at the Harvard University Institute for Applied Computational Science's (IACS) annual symposium in Cambridge, Mass. Data products capture the behavior of users in the hopes of predicting their behavior. That concept is nothing new for "classical statistics problems," Schutt said, where models are often designed to understand causation or make predictions. But with this technology, data scientists must keep in mind that data products built on human behavior can in and of themselves change the behavior of people who use those products.

"In the context of building data products … the models and algorithms that you use to predict also have the ability to cause," she said. Think of it this way: When a statistician builds a model to predict the weather, the model will never cause the weather to change, she said. But when a statistician builds a model to predict who a user knows or what information a user is looking for, how search results and recommendations are selected and ranked could influence what the user clicks on.

"You have to be aware of the impact you're having on products going out into society," Schutt said.

More data is better

Having more data beats having better models. "Not all of the time, but often," said Diane Lambert, research scientist for Mountain View, Calif.-based Google Inc. Take search queries. Users who query Britney Spears and misspell her name still expect to see results for the Britney Spears. "If you have more data, you can start solving problems like that where people spell really badly."

And tapping into all of the data on the Web can produce sophisticated products. Just look at Google predictive text, which anticipates what the query is likely to be before the user finishes typing as well as what type of advertisements best match that query -- all at lightning speed. (See caveat above.) More data often beats better models, Lambert said, but more data isn't everything.

"You still need to experiment," she said at the IACS symposium. "Without experiments, you can't answer questions like, 'Is this [user interface] change better or is it going to confuse people?'"

Google runs thousands of experiments at the same time, Lambert noted -- with our help. "If you've ever put a query into Google, you've been in an experiment."

Better than Google?

What's the next new thing in search? Cynthia Rudin is building a better search engine by "growing a list."

"The current generation of search engines tells you where to find information, but the next generation finds it for you," Rudin, associate professor of statistics at Cambridge, Mass.-based Massachusetts Institute of Technology, said at the IACS symposium.

Rudin's search engine starts with a "seed" and then aggregates information from expert online sources to find more information related to the seed. The algorithm competes with Google Sets, which came about 10 years ago, and Boo!Wa!, an academic engine, and, according to Rudin, significantly outperforms both.

In a search for annual Boston events, Rudin's model came up with a good-sized list that included First Night Boston, Beantown Jazz Festival, Chinatown Main Street Festival and so on. Boo!Wa! came back with a mix of events (Boston Wine Expo), general topics (Boston Red Sox) and junk results (parking in Boston). And Google Sets? Its list included the Boston Massacre … and zero annual events.

Tweet roundup from #datastorm14

The IACS symposium attracted the cream of the academic data expert crop and beyond, who naturally held forth on Twitter. Intelligentsia from University of California at Berkley, University of Washington, Google, IBM and Dropbox, to name a handful, exclaimed on the new (newly anointed?) field of data science. Read all the tweets by searching #datastorm14 on Twitter, or enjoy this slightly edited collection.

1 comment

Register

Login

Forgot your password?

Your password has been sent to:

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy