Announcing DataSift VEDO - Giving Structure To Social Data

Submitted by Richard Caudle on Tue, 12/10/2013 - 17:42

Today we announced the arrival of DataSift VEDO. In this post I’ll outline what this means to you as a developer or analyst.

DataSift VEDO gives you a robust solution to add structure to social data, solving one of the common challenges when working with unstructured ‘big data’. VEDO lets you define rules to classify data so that it fits your business model. The data delivered to your application needs less post-processing and is much easier to work with. The new features will save you time and give you a load more possibilities for your social data.

Data Is Meaningless Without Structure

When working with big data such as social content, one challenge you will always need to tackle is giving unstructured data meaningful structure. If you’re working with our platform currently, you will no doubt be extracting data to your server and running post-processing rules to organise the data to meet your needs.

Processing unstructured data is expensive and not much fun, but it’s where we excel. VEDO lets you offload processing onto our platform. You can now use CSDL (the same language you use for filtering) to add custom metadata labels and scores to data specifically for your use case.

Introducing Tagging And Scoring

VEDO introduces new features which let you attach this metadata, these are tagging and scoring.

Tagging allows you to categorize interactions to match your business model. Any interaction that matches a tagging rule will be given the appropriate text label, serving as a boolean flag to indicate whether an interaction belongs to a category.

Scoring builds on tagging allowing you to attach numerical values to interactions rather than just labels. Scoring allows you to build up a score over many rules, and allows you to model subtle concepts such as priority, intention and weighting.

As you begin to use tagging and scoring more and more, you will want to be able to organise your growing set of rules. To help we have also introduced tag namespaces and reusable tag definitions. Tag namespaces allow you to define taxonomies of tags. You can group tags at any number of levels in namespaces and build deep schemas to fully reflect your model. Reusable tag definitions allow you to perfect your rules and reuse them across any number of streams and projects.

Definition Library

Tagging and scoring are powerful features, but at this point you might not have grasped exactly how they can help you. Therefore alongside the tagging features we’ve also introduced a library of definitions to get you started. Some definitions you can use immediately in your streams (and benefit from our experience), and some serve as example definitions to show you what is now possible.

For example, we have definitions that help you score content for quality (such as how likely is the content a job advert?) and make it easier to exclude spam. On the other hand we have an example definition that shows how you can use the new features to classify conversations for customer service teams, picking out rants, raves and enquiries.

There’s More...

Although tagging is the main theme of the new release, there is an awful lot more happening here at DataSift. Alongside the release of VEDO we’re giving you more power, more connectivity and a wider range of sources to play with.

For instance we’ve just introduced delivery destinations for MySQL and PostgreSQL. These new destinations allow you to map your filtered data directly to a tabular schema and have it pushed directly into your database.

We’re also in the process of bringing many more sources onboard (you may have seen our recent announcements!), including many asian social networks.

Look out for improvements to help you work with a wider variety of languages, updates to our developer tools and client libraries, and much much more. I’ll cover these all soon.

Watch this space

In summary there’s far too much to cover in detail here. So watch this space, as over the coming weeks I’ll cover every feature of the new release in depth, with worked examples and sample code so you can take advantage of all these new powers for yourself.