Startups Mine the Real-Time Web

Startups Mine the Real-Time Web

Truviso and another startup, StreamBase, based in Lexington, MA, have created software to process real-time analytics data. Both companies were spun out of university research aimed at processing real-time data from sensor networks, such as those used to monitor environmental conditions. Richard Tibbetts, CTO of StreamBase, explains that financial markets make up about 80 percent of his company’s customers today. Web companies are just starting to adopt the technology.

“You’re going to see real-time Web mashups, where data is integrated from multiple sources,” Tibbetts says. Such a mashup could, for example, monitor second-to-second fluctuations in the price of airline tickets and automatically purchase one when it falls below a certain price.

Truviso recently launched a feature that allows users to calculate unique visitors to a website in real time. This has historically been a difficult problem because several steps must be performed each time to make sure the user is really distinct. Both StreamBase and Truviso rely on accessing conventional, structured databases. Lorica sees potential for real-time analysis of unstructured data–a set of numbers found scattered through a paragraph of text rather than formatted in a chart.

Software frameworks, such as Hadoop and Google’s MapReduce, which process large amounts of Web data using large numbers of computers, are often used to analyze unstructured data. Recent research from Yahoo and the University of California, Berkeley also promises to make these frameworks work in real-time, too.

Joseph Hellerstein, a UC Berkeley professor of computer science who was involved with this work, explains that the key was to find a way to make Hadoop and MapReduce faster and more interactive without compromising their ability to protect data.

Real-time applications, whether using traditional database technology or Hadoop, stand to become much more sophisticated going forward. “When people say real-time Web today, they have a narrow view of it–consumer applications like Twitter, Facebook, and a little bit of search,” says StreamBase’s Tibbetts.