Browse, Build and Share Real-time Streams with DataSift

A core feature of the real-time web is the continuously updating real-time streams of information. These streams are commonly generated by social networks and with the continued uptake of social networking the amount of information is only going to increase. This will continue to introduce opportunities for companies to create products and services that extract value from that vast amount of data. Some of the most common services built around these streams include trend and sentiment analysis, data storage, aggregation, sorting, search and filtering. DataSift is a service that offers a host of exciting features including the ability to let users browse, build and share their own real-time streams using social media data drawn from a host of sources.

DataSift launched its Alpha service at TechCrunch Disrupt in September and describe it as a "real time social media filtering engine." The initial buzz around DataSift was generated when Twitter agreed to give them access to the Twitter firehose, but they now have access to a much wider range of data including the Google Buzz, MySpace, SixApart, WordPress, Facebook and Digg. These sources of data within DataSift, sometimes called input services, are defined as Targets in the DataSift knowledge base.

Users can use Targets to create their own streams from within the My Streams section of the DataSift dashboard using a language called FSDL (Filtered Stream Definition Language)CSDL (CSDL for Curation Stream Definition Language). The Web editor used to define your streams is pretty simple but it does provide some basic syntax highlighting as well as validating your syntax whenever you save. CSDL also provides access to augmentation Targets through services such as Salience, TweetMeme, Peer Index, Klout and InfoChimps that allow streams to be augmented with third party data.

Update: FSDL has since been renamed CSDL for Curation Stream Definition Language).

Once a stream has been defined you can choose to build your feed. At present this takes up to 60 minutes and the dashboard will show you the progress of the build as well as a host of other features including a data preview, a live example of the data, a graph showing matched stream items (iterations per minute) and the history of the feed definition.

Update: Nick Halstead (CEO of DataSift) has provided a clarification about the stream preview, the live stream feature and also when the stream can be used via the API:

The stream ‘preview’ does not need to be built for anything to work, you can define + attach to API immediately (or just hit ‘live’ tab to see live results) – the preview was to allow stream owners to demonstrate what the stream would offer.

DataSift is encouraging its users to build feeds that are discoverable and accessible to other users, although it does offer a private feed option, by providing a number of options on the stream page including tagging, an area to encourage you to tell others about it on common social networks and a comments area to encourage users to interact and visits to the stream page. Recently added, most commented and top rated streams are also featured on the home section of the DataSift dashboard.

Once the stream has been built it can also be used in the definition of another user stream, and it in another stream and so on. DataSift really have exposed a lot of functionality and capabilities within their user dashboard and the documentation that they provide is quite thorough and really helps a user get to grips with creating streams reasonably quickly.

All these rich features would be a waste unless there was a way of accessing the data and using it with an application. Unsurprisingly, the DataSift API delivers by providing three endpoints, including access to filtered data in a paged manner, HTTP Streaming and an RSS endpoint. It also recently introduced a WebSockets Streaming API.

Although DataSift is still in Alpha it is offering what seems an affordable and very impressive service which should excite any developer interested in real-time technologies and data. The company's access to a wide range of data sources, engaging and intuitive user dashboard and range of API endpoints should mean that most developers will have their technology needs met.

A good starting point to learn about DataSift is an interview by Robert Scoble with Nick Halstead, the CEO of DataSift (embedded above). The video is a little old but provides a good overview and an example of creating a stream. If you've any comments or questions about DataSift please leave a comment here. After that you should head over to http://datasift.net and register for the DataSift Alpha program.

Thanks for the writeup Phil, one slight inaccuracy (which is our fault), the stream 'preview' does not need to be built for anything to work, you can define + attach to API immediately (or just hit 'live' tab to see live results) - the preview was to allow stream owners to demonstrate what the stream would offer.

You can also attach to the API with a raw bit of FSDL and it will compile and return results immediately.

We obviously need some work to interface + documentation to make all of this much clearer.

[...] and we&#8217;ve already seen this directly through the Cadmus API and within services such as DataSift, where they use third party services such as Klout, Peer Index and Salience for influence and [...]