Data Explorer walkthrough – Parsing a Twitter list

Yesterday the public availability of Data Explorer, a new data mashup tool from the SQL Server team, was announced at Announcing the Labs release of Microsoft Codename “Data Explorer”. Upon seeing the first public demos of Data Explorer at SQL Pass 2011 I published a blog post Thoughts on Data Explorer which must have caught someone’s attention because soon after I was lucky enough to be invited onto an early preview of Data Explorer, hence I have spent the past few weeks familiarising myself with it. In this blog post I am going to demonstrate how one can use Data Explorer to consume and parse a Twitter list.

I have set up a list specifically for this demo and suitably it is a list of tweeters that tweet about Data Explorer – you can view the list at http://twitter.com/#!/list/jamiet/data-explorer. Note that some of the screenshots in this blog post were taken prior to the public release and many of them have been altered slightly since then; with that in mind, here we go.

I’ve digressed a little, let’s get back to our mashup. We’re going to use a function called Web.Contents() to consume the contents of the Twitter API call and pass the results into another function, Json.Document(), which parses the JSON document for us. The full formula is:

When you type in that formula and simply hit enter you’re probably going to be faced with this screen:

Its asking you how you want to authenticate with the Twitter API. Calls to the https://api.twitter.com/1/lists/statuses.json resource don’t require authentication so anonymous access is fine, just hit continue. When you do you will see something like this:

The icon

essentially indicates a dataset, so each record of these results is in itself another dataset. We’ll come onto how we further parse all of this later on but before we do we should clean up our existing formula so that we’re not hardcoding the values “data-explorer” and “jamiet”.

The Web.Contents() function possesses the ability to specify named parameters rather than including them in the full URL. Change the formula to:

That will return the same result as before but now we’ve broken out the query parameters {slug, owner_screen_name} into parameters of Web.Contents(). That’s kinda nice but they’re still hardcoded; instead what we want to do is turn the whole formula into a callable function, we do that by specifying a function signature and including the parameters of the signature in the formula like so:

Let’s give our new function a more meaningful name by right-clicking on the resource name which is currently set as “Custom1” and renaming it as “GetTwitterList”:

We have now defined a new function within our mashup called GetTwitterList(slug, owner_screen_name) that we can call as if it were a built-in function.

Let’s create a new resource as a formula that uses our new custom function and pass it some parameter values:

= GetTwitterList("data-explorer", "jamiet")

We still have the same results but now via a nice neat function that abstracts away the complexity of Json.Document( Web.Contents() ).

As stated earlier each of the records is in itself a dataset each of which, in this case, represents lots of information about a single tweet. We can go a long way to parsing out the information using a function called IntoTable() that takes a dataset and converts it into a table of values:

Here is the result of applying IntoTable() to the results of GetTwitterlist():

This is much more useful, we can now see lots of information about each tweet however notice that information about the user who wrote the tweet is wrapped up in yet another nested dataset called “user”.

All the time note how whatever data we are seeing and whatever we do to that data via the graphical UIs is always reflected in the formula bar; in the screenshot immediately above notice that we are selecting the “user” and “text” columns (the checkbox for “user” is off the screen but is checked).

We can now parse out the user’s screen_name using a different function – AddColumn(). AddColumn() taken an input and allows us to define a new column (in this case called “user-screen_name”) and specify an expression for that column based on the input. A picture speaks a thousand words so:

There we have our new column, user_screen_name, containing the name of the tweeter that tweeted the tweet. At this point let’s take a look at the raw JSON to see where this got parsed out from:

Notice that the screen_name, UserEd_, is embedded 3 levels deep within the hierarchical JSON document.

We’re almost there now. The final step is to use the function SelectColumns() to select the subset of columns that we are interested in::

= Table.SelectColumns(InsertedCustom,{"text", "user_screen_name"})

Which gives us our final result:

At this point hit the Save button:

OK, so we have a mashup that pulls some data out of twitter, parses it and then….well…nothing! It doesn’t actually do anything with that data. We have to publish the mashup so that it can be consumed and we do that by heading back to the home page (which is referred to as “My Workspace”) by clicking the My Workspace button near the top of the page:

Back in My Workspace you can select your newly created mashup (by clicking on it) and options Preview, Snapshot & Publish appear:

We’ll ignore Preview and Snapshot for now, hit the Publish button instead at which point we are prompted for a name that we will publish the mashup as:

Hitting Publish will do the necessary and make our data feed available at a public URI:

We have used Data Explorer’s JSON parsing and dataset navigation abilities to pull out the data that we are interested in and present it in a neat rectangular data structure that we are familiar with. Moreover we have done it without installing any software and we have made that data accessible via an open protocol; that’s pretty powerful and, in my wholly worthless opinion, very cool indeed.

"most importantly, you can access the output of the mashup via an OData feed"

I was implying that the important thing was that the data was accessible, not that it was accessible via OData per se. That being said, I have nothing against OData and hence I'm wondering what you meant by:

"Are you being scarcastic or have change your mind about the technology?"