Turn Twitter Into Your Personal Assistant

The combination of social networking with public messaging, link posting, and subscriptions leads to impressive synergy effects, but there is also the drawback of information overload, information loss, distraction, and content redundancy.

by Benjamin Nowack

Feb 13, 2009

Page 1 of 2

icroblogging service Twitter has become a disruptive everyday tool. It is increasingly replacing not only instant messaging clients, but also social bookmarking sites, interest tracking applications, support forums, email, and (to a certain extent) classical blogs.

A few simple conventions, together with RDF
and SPARQL, can turn your Twitter feeds into rich information streams, which you can then use for a more productive microblogging experience.

The following sections explain how to:

Enhance microposts with machine-extractable data

Query the extracted data with SPARQL

Generate custom streams and reports to support your personal workflow

You can reproduce the examples in this article with the supplied source code that contains early components of a semantic microblogging system.

System Setup

Download the source code archive (smesher.zip) and copy its contents to your web server. Follow the
setup instructions in the readme.txt file, you will need Apache, PHP, and a MySQL database. The
project consists of five directories, with application-specific sub-directories in code/ and
themes/:

cache: should be write-enabled, used for CSS and JavaScript documents

code:

arc: the core RDF toolkit

trice: reusable framework components

smr: the project controller, custom scripts, and templates

config: database configuration and path dispatching rules

logs: should be write-enabled, used for system messages

themes: CSS and images

Step 1: Subscribe to Your Twitter Feeds

First, you need some input data to work with, such as the most recent posts mentioning your username or interesting
keywords (see Figure 1). Luckily, Twitter provides Atom feeds for all pages, and the demo system includes an Atom-to-RDF converter,
so you don't have to learn how the Twitter API works. You can directly import user timelines and search results
instead. Click Settings in the upper right navigation to open a simple Feeds form. For the sake of simplicity, you
only have to enter your username and a set of tags that are then used internally to generate corresponding feed URLs.

When you are done, return to the main screen by clicking on the logo in the upper left corner. Instead of
cronjobs or background processes, the demo simply checks and periodically refreshes your
subscriptions when you access the start page. After a few seconds (you might have to reload the page to see the
changes), the first items should appear, as shown in Figure 2.

Figure 1. Import settings: Based on the provided information, the demo application imports a selection
of microfeeds.

Figure 2. Initial timeline: So far, the microposts can (only) be filtered by author.

Step 2: Explore the Data

The individual items carry a number of structured elements, which you can use for formatting (for example,
displaying an image instead of a raw avatar URL) or basic filtering (for example, by author). Together with SPARQL's

Figure 3. SPARQL API Example: The COUNT feature is not part of the current SPARQL specification yet, but a new W3C Working Group just launched to explore aggregate functions and similar extensions.

REGEX command, you can already run some interesting queries against the API at
/sparql. For example, this SPARQL query returns the names and Twitter accounts of people who mentioned "Berners-Lee" in their posts:

Not too spectacular data-wise, but the exciting thing here is the fact that a semantic API lets you retrieve exactly the
elements that you need (see Figure 3). Twitter's search feature can only return a list of posts, SPARQL allows you to
generate a list of persons, or dates, or any other available attribute. This greatly simplifies data integration and
repurposing.

Step 3: Increase the Granularity

While the default structures are a handy starting point, the really interesting data is still hidden in the post's
body. People are addressed (leading @name) or mentioned (@name somewhere in the text), hashtags (#tag) and
links (http://...) are embedded, and quoted Tweets are marked up with a leading RT.

The demo system contains a PHP class (located at code/smr/SMR_RDFExtractor.php) that auto-extracts these elements from the otherwise opaque content and turns them into RDF triples. The converter is based on simple regular expressions and you can extend them with custom patterns (more on this later).

After the granular information is added to the RDF store, you may add respective filters to the main view.
The facets are defined in code/smr/options/SMR_Options_DefaultBox.php. You can add entries to
the getTabs method, and then write a matching method where the SPARQL pattern with its RDF relation is specified:

Figure 4. Filtered Stream: The advanced facets helps you find out who re-tweeted any of your posts, or
posts that contain a certain link, or popular links in general.