The Wolfram Data Drop Is Live!

Where should data from the Internet of Things go? We’ve got great technology in the Wolfram Language for interpreting, visualizing, analyzing, querying and otherwise doing interesting things with it. But the question is, how should the data from all those connected devices and everything else actually get to where good things can be done with it? Today we’re launching what I think is a great solution: the Wolfram Data Drop.

When I first started thinking about the Data Drop, I viewed it mainly as a convenience—a means to get data from here to there. But now that we’ve built the Data Drop, I’ve realized it’s much more than that. And in fact, it’s a major step in our continuing efforts to integrate computation and the real world.

So what is the Wolfram Data Drop? At a functional level, it’s a universal accumulator of data, set up to get—and organize—data coming from sensors, devices, programs, or for that matter, humans or anything else. And to store this data in the cloud in a way that makes it completely seamless to compute with.

Our goal is to make it incredibly straightforward to get data into the Wolfram Data Drop from anywhere. You can use things like a web API, email, Twitter, web form, Arduino, Raspberry Pi, etc. And we’re going to be progressively adding more and more ways to connect to other hardware and software data collection systems. But wherever the data comes from, the idea is that the Wolfram Data Drop stores it in a standardized way, in a “databin”, with a definite ID.

Here’s an example of how this works. On my desk right now I have this little device:

Every 30 seconds it gets data from the tiny sensors on the far right, and sends the data via wifi and a web API to a Wolfram Data Drop databin, whose unique ID happens to be “3pw3N73Q”. Like all databins, this databin has a homepage on the web: wolfr.am/3pw3N73Q.

The homepage is an administrative point of presence that lets you do things like download raw data. But what’s much more interesting is that the databin is fundamentally integrated right into the Wolfram Language. A core concept of the Wolfram Language is that it’s knowledge based—and has lots of knowledge about computation and about the world built in.

For example, the Wolfram Language knows in real time about stock prices and earthquakes and lots more. But now it can also know about things like environmental conditions on my desk—courtesy of the Wolfram Data Drop, and in this case, of the little device shown above.

Here’s how this works. There’s a symbolic object in the Wolfram Language that represents the databin:

And one can do operations on it. For instance, here are plots of the time series of data in the databin:

And here are histograms of the values:

And here’s the raw data presented as a dataset:

What’s really nice is that the databin—which could contain data from anywhere—is just part of the language. And we can compute with it just like we would compute with anything else.

So here for example are the minimum and maximum temperatures recorded at my desk:(for aficionados: MinMax is a new Wolfram Language function)

We can convert those to other units (% stands for the previous result):

Let’s pull out the pressure as a function of time. Here it is:

Of course, the Wolfram Knowledgebase has historical weather data. So in the Wolfram Language we can just ask it the pressure at my current location for the time period covered by the databin—and the result is encouragingly similar:

Here’s an important thing: notice that when we got data from the databin, it came with units attached. That’s an example of a crucial feature of the Wolfram Data Drop: it doesn’t just store raw data, it stores data that has real meaning attached to it, so it can be unambiguously understood wherever it’s going to be used.

The beauty of all this is that once data is in the Wolfram Data Drop, it becomes both universally interpretable and universally accessible, to the Wolfram Language and to any system that uses the language. So, for example, any public databin in the Wolfram Data Drop can immediately be accessed by Wolfram|Alpha, as well as by the various intelligent assistants that use Wolfram|Alpha. Tell Wolfram|Alpha the name of a databin, and it’ll automatically generate an analysis and a report about the data that’s in it:

Through WDF, the Wolfram Data Drop immediately handles more than 10,000 kinds of units and physical quantities. But the Data Drop isn’t limited to numbers or numerical quantities. You can put anything you want in it. And because the Wolfram Language is symbolic, it can handle it all in a unified way.

Somewhere in our Quality Assurance department there’s a camera on a Raspberry Pi watching two recently acquired corporate fish—and dumping an image every 10 minutes into a databin in the Wolfram Data Drop:

In the Wolfram Language, it’s easy to stack all the images up in a manipulable 3D “fish cube” image:

Or to process the images to get a heat map of where the fish spend time:

We can do all kinds of analysis in the Wolfram Language. But to me the most exciting thing here is how easy it is to get new real-world data into the language, through the Wolfram Data Drop.

Around our company, databins are rapidly proliferating. It’s so easy to create them, and to hook up existing monitoring systems to them. We’ve got databins now for server room HVAC, for weather sensors on the roof of our headquarters building, for breakroom refrigerators, for network ping data, and for the performance of the Data Drop itself. And there are new ones every day.

Lots of personal databins are being created, too. I myself have long been a personal data enthusiast. And in fact, I’ve been collecting personal analytics on myself for more than a quarter of a century. But I can already tell that March 2015 is going to show a historic shift. Because with the Data Drop, it’s become vastly easier to collect data, with the result that the number of streams I’m collecting is jumping up. I’ll be at least a 25-databin human soon… with more to come.

A really important thing is that because everything in the Wolfram Data Drop is stored in WDF, it’s all semantic and canonicalized, with the result that it’s immediately possible to compare or combine data from completely different databins—and do meaningful computations with it.

So long as you’re dealing with fairly modest amounts of data, the basic Wolfram Data Drop is set up to be completely free and open, so that anyone or any device can immediately drop data into it. Official users can enter much larger amounts of data—at a rate that we expect to be able to progressively increase.

Wolfram Data Drop databins can be either public or private. And they can either be open to add to, or require authentication. Anyone can get access to the Wolfram Data Drop in our main Wolfram Cloud. But organizations that get their own Wolfram Private Clouds will also soon be able to have their own private Data Drops, running inside their own infrastructure.

So what’s a typical workflow for using the Wolfram Data Drop? It depends on what you’re doing. And even with a single databin, it’s common in my experience to want more than one workflow.

It’s very convenient to be able to take any databin and immediately compute with it interactively in a Wolfram Language session, exploring the data in it, and building up a notebook about it

But in many cases one also wants something to be done automatically with a databin. For example, one can set up a scheduled task to create a report from the databin, say to email out. One can also have the report live on the web, hosted in the Wolfram Cloud, perhaps using CloudCDF to let anyone interactively explore the data. One can make it so that a new report is automatically generated whenever someone visits a page, or one can create a dashboard where the report is continuously regenerated.

It’s not limited to the web. Once a report is in the Wolfram Cloud, it immediately becomes accessible on standard mobile or wearable devices. And it’s also accessible on desktop systems.

You don’t have to make a report. Instead, you can just have a Wolfram Language program that watches a databin, then for example sends out alerts—or takes some other action—if whatever combination of conditions you specify occur.

You can make a databin public, so you’re effectively publishing data through it. Or you can make it private, and available only to the originator of the data—or to some third party that you designate. You can make an API that accesses data from a databin in raw or processed form, and you can call it not only from the web, but also from any programming language or system.

A single databin can have data coming only from one source—or one device—or it can have data from many sources, and act as an aggregation point. There’s always detailed metadata included with each piece of data, so one can tell where it comes from.

For several years, we’ve been quite involved with companies who make connected devices, particularly through our Connected Devices Project. And many times I’ve had a similar conversation: The company will tell me about some wonderful new device they’re making, that measures something very interesting. Then I’ll ask them what’s going to happen with data from the device. And more often than not, they’ll say they’re quite concerned about this, and that they don’t really want to have to hire a team to build out cloud infrastructure and dashboards and apps and so on for them.

Well, part of the reason we created the Wolfram Data Drop is to give such companies a better solution. They deal with getting the data—then they just drop it into the Data Drop, and it goes into our cloud (or their own private version of it), where it’s easy to analyze, visualize, query, and distribute through web pages, apps, APIs, or whatever.

It looks as if a lot of device companies are going to make use of the Wolfram Data Drop. They’ll get their data to it in different ways. Sometimes through web APIs. Sometimes by direct connection to a Wolfram Language system, say on a Raspberry Pi. Sometimes through Arduino or Electric Imp or other hardware platforms compatible with the Data Drop. Sometimes gatewayed through phones or other mobile devices. And sometimes from other clouds where they’re already aggregating data.

We’re not at this point working specifically on the “first yard” problem of getting data out of the device through wires or wifi or Bluetooth or whatever. But we’re setting things up so that with any reasonable solution to that, it’s easy to get the data into the Wolfram Data Drop.

There are different models for people to access data from connected devices. Developers or researchers can come directly to the Wolfram Cloud, through either cloud or desktop versions of the Wolfram Language. Consumer-oriented device companies can choose to set up their own private portals, powered by the Wolfram Cloud, or perhaps by their own Wolfram Private Cloud. Or they can access the Data Drop from a Wolfram mobile app, or their own mobile app. Or from a wearable app.

Sometimes a company may want to aggregate data from many devices—say for a monitoring net, or for a research study. And again their users may want to work directly with the Wolfram Language, or through a portal or app.

When I first thought about the Wolfram Data Drop, I assumed that most of the data dropped into it would come from automated devices. But now that we have the Data Drop, I’ve realized that it’s very useful for dealing with data of human origin too. It’s a great way to aggregate answers—say in a class or a crowdsourcing project—collect feedback, keep diary-type information, do lifelogging, and so on. Once one’s defined a data semantics signature for a databin, the Wolfram Data Drop can automatically generate a form to supply data, which can be deployed on the web or on mobile.

The form can ask for text, or for images, or whatever. And when it’s text, our natural language understanding system can take the input and automatically interpret it as WDF, so it’s immediately standardized.

Now that we’ve got the Wolfram Data Drop, I keep on finding more uses for it—and I can’t believe I lived so long without it. As throughout the Wolfram Language, it’s really a story of automation: the Wolfram Data Drop automates away lots of messiness that’s been associated with collecting and processing actual data from real-world sources.

And the result for me is that it’s suddenly realistic for anyone to collect and analyze all sorts of data themselves, without getting any special systems built. For example, last weekend, I ended up using the Wolfram Data Drop to aggregate performance data on our cloud. Normally this would be a complex and messy task that I wouldn’t even consider doing myself. But with the Data Drop, it took me only minutes to set up—and, as it happens, gave me some really interesting results.

I’m excited about all the things I’m going to be able to do with the Wolfram Data Drop, and I’m looking forward to seeing what other people do with it. Do try out the beta that we launched today, and give us feedback (going into a Data Drop databin of course). I’m hoping it won’t be long before lots of databins are woven into the infrastructure of the world: another step forward in our long-term mission of making the world computable…

Thanks for your comment. JSON support is coming soon. At the moment, Data Drop does not support file uploads of data because it is designed to accept data as individual entries of key/value pairs, however, you can find DatabinUpload here as a first step for adding entries in bulk.

Very interesting. What is the security like on this? Are you encrypting the data and if so, how, both in storage and in transit? How do you manage access?

With the potential for sensitive information to be stored here, and with the propensity of bad actors using such information (including other individuals, corporations, governments), it would seem that you would have a lot more about how this information is protected and controlled.

Interesting! What could this bring to the enterprises and how? I do miss that overall picture. Most data ends up in databases or in semi-structure data in Hadoop clusters. If we would adapt to this approach how can we analyse all this data in all different databins? Are Hadoop->Databins generators in the making? Or databins-> Databases (SAP HANA or Microsoft Analytics platform)? Would be something for a nice blog?

Thank you for you comment. The big picture of Data Drop is that it lets you store, analyze, and act on your data all with the same tools: the Wolfram Language and the Wolfram Cloud. Data Drop’s first focus is making it easier to get data into the Wolfram Language on a per-entry basis. From there you can pull data from one or more databins into your session for analysis as desired.

Are there plans to offer a private cloud version supporting this data drop priced at a level that is more affordable to small businesses? Starting at $25K is not going to work for the numerous smaller shops who would like to have an on-premise or privately hosted (by arbitrary providers) access to their IoT data.

Thanks for your comment, this early beta version of Data Drop is a service of the Wolfram Cloud, and requires access to the full Cloud. As the technology matures, we will seek to offer it in a more modular fashion so that customers have more options at different price ranges. We’d love to hear more about your specific needs; please contact us via https://datadrop.wolframcloud.com/contact.html

An alternative approach analogous to the Wolfram Data Drop might be the use of a graph database to store the data and to expose it via a SPARQL endpoint. Graph databases or triple stores, SPARQL, SPARQL Federated Query, and other standardized technologies from the W3C Data Activity (http://www.w3.org/2013/data/) seem to provide the ingredients for an interoperable back-end. Does the vision for Wolfram Data Drop include interoperability with SPARQL Federated Query? Will there be a SPARQL endpoint for the Wolfram Data Drop?