With the rapid growth of the web, there are grand challenges when making sense of web data: big volume, high velocity, high variety, and unknown veracity. In the physical world, a sensor is a converter that measures a physical quantity and converts it into a signal that can be read by an observer or by an instrument—today, mostly electronic. This project creates a virtual, WebSensor layer atop the web.

Websensor Platform

A WebSensor is a programmable, focused crawler that continuously discovers, extracts, and aggregates structured information about a topic. A WebSensor platform based on Windows PowerShell and the .NET Framework makes it easy for developers to create WebSensors that continuously extract information from the web and generate time-series stream data. End users also can create WebSensors easily for their daily life.

The websensor platform has many built-in capabilities to extract and collect time-sequenced data embedded in web sites. These built-in capabilities include:

Convenient wrapper generation on webpages (just by a few clicks)

Automatically wrapper adaption to page layout change

Easy to configure and run

Easy to extend using simple script language

Easy to manage and retrieve the data collected

Websensors can connect to form a sensor network for more complex analysis tasks that involve multiple time-sequenced data.

Examples

Tracking count of Bill Gates' followers on twitter.com

It's super easy to track Bill Gates' follower count: just by a click on the current count of followers (8,903,947 on the following snapshot). A time series will then be generated and it will keep update.

the original Bill Gates' Twitter page

The time series ouputted by the sensor which tracks Bill Gates' follower count