Posts tagged with: sparql

I recently read an article about how negative assertions about something are automatically getting associated with the person who made them. For example, if you say negative things about your competitor's products, people will subconsciously link these negative sentiments directly with you. A psychology thing. So, my recent rants about the RDF spec mania at the W3C have already lead to an all-time low karma level in the RDF community, and I'm trying hard to keep away from discussions about RDFa 1.1 or RDF Next Steps etc. to not make things worse. (Believe it or not, not all Germans enjoy being negative ;)

Now, why another post on this topic? ARC2's development is currently on hold as my long-time investor/girlfriend pulled the plug on it and (rightly so) wants me to focus on my commercial products. With ARC spreading, the maintenance costs are rising, too. There are some options around paid support, sponsoring and donations that I'm pondering, but for now the mails in my inbox are piling up, and one particular question people keep asking is whether ARC is going to support upcoming SPARQL 1.1 or if I'm going to boycott it and perhaps think that the W3C specs are preventing the semantic web from gaining momentum. Short answer (both times): To a certain extent, yes.

Funnily, this isn't so much a question about developers wanting to implement SPARQL 1.1, but rather if they actually can implement it, in an efficient way. SPARQL 1.1 standardizes a couple of much-needed features that we had in ARC's proprietary SPARQL+ for a couple of years. Things like aggregates and full CRUD which I managed to implement in a fast-enough way for my client projects. But when it comes to all the other features in SPARQL 1.1, the suggestions coming out of the "RDF 2.0" initiative, and the general growth of the stack, I do wonder if the RDF community is about to overcookbake its technology layer cake.

Not that any particular spec was bad or useless, but it is becoming increasingly hard for implementors to keep up. Who can honestly justify the investment in the layer cake if it takes a year to digest it, another year to implement a reasonable portion of it, and then a new spec obsoletes the expensive work? The main traction the Semantic Web effort is seeing happens around Linked Data, which uses only a fraction of the stack, and interestingly in a way non-compliant with other W3C recommendations such as OWL, because the latter doesn't provide the needed means for actual symbol linking (or didn't explain it good enough).

A central problem could be lack of targeting, and lack of formulating the target audience of a particular spec. 37signals once said that good software is opinionated. The RDF community is doing the exact opposite and seems to desperately try to please everyone. The groups follow the throw-it-out-and-see-what-sticks approach. And every new spec is thrown on the stack, with none of them having a helpful description for orientation. No one is taking the time to reduce confusion, to properly explain who is meant to implement the spec, who is meant to use the spec, and how the spec relates to other ones. Sure, new specs raise the market entrance barrier and thus help the few early vendors to keep competition away. But if the market growth gets delayed this way, it may die, or at least an unnecessary number of startups do. (Siderean is one example, their products were amazing. Another one is Radar Networks, which suffered from management issues, but they might have survived if they had spent less money trying to implement an OWL engine for Twine.)

For the fun of it, here are some micro-summaries for RDF specs, how I as a web developer understand them:

RDF: "A schema-less key-value system that integrates with the web." (Oha!)

RSS 1.0: "Rich data streams." (This is the stuff the thought leaders then said would never be needed, and which now inefficiently have to be squeezed into Atom extensions. Deppen!)

OWL 1: "Dumbing down KR-style modeling and inference to the web coder level" (I really liked that approach, it attracted me to the SemWeb idea in the first place, even though I later discovered that RDF Schema is sufficient in many cases.)

GRDDL: "For HTML developers who are also XSLT nerds." (A failure, possibly because the target audience was too small, or because HTML creators didn't care for XML processing requirements. Or the chained processing of remote documents was simply too complex.)

OWL 2: "Made for the people who created it, and maybe AI students." (Never needed any of its features that I couldn't have more easily with simple SPARQL scripts. I think some people need and use it, though.)

RIF: "Even more features than OWL2, and yet another syntax". Alternative summary (for a good ROFL): "Perfect for Facebook's Open Graph". (No use case here. Again, YMMV.)

SPARQL 1.1: "Getting at par with enterprise databases, at any cost." (A slap in the face of web developers. Too many features that are not implementable in any reasonable time, nor in its entirety, nor with user-satisfying performance. Profiles for feature subsets could still save it, though).

Microdata: "RDF-in-HTML made easy for CMS developers and JavaScript coders" (Not sure if it'll succeed, but it works well for me.).

SKOS: "An interesting alternative to RDFS and OWL and a possible bridge to the Web 2.0 world." (Wish I had time to explore SKOS-centric app development, the potential could be huge.)

I still believe that the lower-end adoption issue could be solved by a set of smaller layer cakes, each baked for and marketed to a defined and well-understood target audience. If the W3C groups continue to add to the same cake, it's going to crumble apart sooner or later, and the higher layers are going to bury the foundations. Nobody is going to taste from it at all then.

And to answer the ARC-related question in more detail, too: Next step is collecting enough funds to test and release a PHP 5.3 E_STRICT version (Thanks so much to all donaters so far, we'll get there!). SPARQL 1.1 compatibility will come, but only for those parts that can be mapped to relational DB functionality. The REST API is on my list, too. Empty graphs, don't think so (which app would need them?). Sub-queries, most probably not. Federated queries, sure, as soon as someone figures out how to do production-ready remote JOINs ;-)

Update: This article has been called unfair and misleading, and I have to agree. I know that spec work is hard, that it's easy to complain from the sideline, and that frustration is part of compromise-driven specifications. Wake-up calls have to be a little louder to be heard, though, but I apologize for the toe-stepping. It is not directed against any person in particular.

While data exchange between different semantic web sources is usually RDF-based (i.e. the data always maintain their semantics), there is one major exception: SPARQL SELECT queries. This developer-oriented operation returns tabular data (similar to record sets in SQL). Once the query result is separated from the query, the associated structural data is lost. You can't directly feed SELECT results back into a triple store, even though querying based on linked resources means that you have just created knowledge. It's a pity to show this generated information to human consumers only.

One of the demos at my NYC talk was a dynamic wiki item that pulled in competitor information from Semantic CrunchBase and injected that into a page template as HTML. The existing RDF infrastructure does not let me cache the SELECT results locally as usable RDF. And a semantic web client or crawler that indexes the wiki page will not learn how the described resource (e.g. Twitter) is related to the remote, linked entities.

However, by simply adding a single RDFa hook to the wiki item template, the RDF relation (e.g. competitor) can be made available again to apps that process my site content. This is basically how Linked Data works. But here is the really nifty thing: My site can be a consumer of its own pages, too, recursively enriching its own data.

I tweaked the wiki script which now works like this: When the page is saved, a first operation updates the wiki markup in the page's graph (i.e. the not-yet-populated template). In a second step, the page URL is retrieved via HTTP. This will return HTML with RDFa-encoded remote data, which is then parsed by ARC, and finally added to the same graph. We end up with a graph that does not only contain the wiki markup, but also the RDFized information that was integrated from remote sites. After adding this graph to the RDF store, we can use a local query to generate the page and occasionally reset the graph to enable copy-by-reference. And all this without any custom API code.

Gave a talk and a workshop in NYC about SemWeb technologies for PHP developers

I'm back from New York, where I was given the great opportunity to talk about two of my favorite topics: Semantic Web Development with PHP, and (not necessarily semantic) Software Development using RDF Technology. I was especially looking forward to the second one, as that perspective is not only easier to understand for people from a software engineering context, but also because it is still a much neglected marketing "back-door": If RDF simplifies working with data in general (and it does), then we should not limit its use to semantic web apps. Broader data distribution and integration may naturally follow in a second or third step once people use the technology (so much for my contribution to Michael Hausenblas' list of RDF MalBest Practices ;)

The talk on Thursday at the NY Semantic Web Meetup was great fun. But the most impressive part of the event were the people there. A lot to learn from on this side of the pond. Not only very practical and professional, but also extremely positive and open. Almost felt like being invited to a family party.

The positive attitude was even true for the workshop, which I clearly could have made more effective. I didn't expect (but should have) that many people would come w/o a LAMP stack on their laptops, so we lost a lot of time setting up MAMP/LAMP/WAMP before we started hacking ARC, Trice, and SPARQL.

Marco brought up a number of illustrating use cases. He maintains an (inofficial, sorry, can't provide a pointer)RDF wrapper for any group on meetup.com, so the workshop participants could directly work with real data. We explored overlaps between different Meetup groups, the order in which people joined selected groups, inferred new triples from combined datasets via CONSTRUCT, and played with not-yet-standard SPARQL features like COUNT and LOAD.

And having done the workshop should finally give me the last kick to launch the Trice site now. The code is out, and it's apparently not too tricky to get started even when the documentation is still incomplete. Unfortunately, I have a strict "no more non-profits" directive, but I think Trice, despite being FOSS, will help me get some paid projects, so I'll squeeze an official launch in sometime soon-ish.

Below are the slides from the meetup. I added some screenshots, but they are probably still a bit boring without the actual demos (I think a video will be put up in a couple of days, though).

The Linked Data meme is spreading and we have strong indications that web developers who understand and know how to apply practical semantic web technologies will soon be in high demand. Not only in enterprise settings but increasingly for mainstream and agency-level projects where scripting languages like PHP are traditionally very popular.

I can't really afford travelling to promote the interesting possibilities around RDF and SPARQL for PHP coders, so I'm more than happy that Meetup master Marco Neumann offered me to come over to New York and give a talk at the Meetup on May 21st. Expect a fun mixture of "Getting started" hints, demos, and lessons learned. In order to make this trip possible, Marco is organizing a half-day workshop on May 22nd, where PHP developers will get a hands-on introduction to essential SemWeb technologies. I'm really looking forward to it (and big thanks to Marco).

So, if you are a PHP developer wondering about the possibilities of RDF, Linked Data & Co, come to the Meetup, and if you also want to get your hands dirty (or just help me pay the flight ticket ;) the workshop could be something for you, too. I'll arrive a few days earlier, by the way, in case you want to add another quaff:drankBeerWith triple to your FOAF file ;)

Running an R&D-heavy agency in the current economical climate is pretty tough, but there are also a couple of new opportunities for these semantic solutions that help reduce costs and do things more efficiently. I'm finally starting to get project requests that include some form of compensation. Not much yet (all budgets seem to be very tight these days), but it's a start, and together with support from Susanne, I could now continue working on Paggr, semsol's Netvibes-like dashboard system for the growing web of Linked Data.

An article about Paggr will be in the next Nodalities Magazine, and the ESWC2009 technologies team is considering a custom system for attendees which is a great chance to maybe get other conference organizers interested. (I see much potential in a white-label offering, but a more mainstream-ish version for Web 2.0 data is still on my mind. Just have to focus on getting self-sustained first.)

Below is a short screencast that demonstrates a first version of the sparqlet (= semantic widget) builder. I've de-coupled sparqlet-serving from the dashboard system, so that I'll be able to open-source the infrastructure parts of Paggr more easily. Another change from the October prototype is the theme-ability of both dashboards and widget servers. Lots of sun, sky, and sea for ESWC ;-)

The article explains the technical side of the MBC09 talk, a code bundle is included. The latter still lacks the just added posting functionality, I'll try to make another release available once the code is a little more stable.

Last week I've been at MBC09, a conference about all things microblogging. It was organized by Cem Basman, who not only managed to get very interesting speakers on stage, but also a report into Germany's main TV news.

The conference started slowly, with a little underwhelming sponsored keynotes, but then quickly turned into a great source for inspiration, thanks to barcamp-style tracks during the rest of the conference. I particularly enjoyed the session about Communote (a microblogging system for corporate use), a talk by Marco Kaiser about near-client XMPP at seesmic, and a panel about "Twitter and Journalism" (really entertaining panel speakers).

As usual, I pulled a near-all-nighter to hack on a funky demo, only to get on stage and have a projector that didn't like MacBooks. Luckily, I was co-speaking with Sebastian Kurt who had slides for using Twitter as an interface to remote apps (todo list management and similar things), so I didn't have to fill the whole 30 mins stuttering about demos that no one could see. Anyway, given the circumstances, the session didn't go too badly. Interestingly, the Zemanta and Calais APIs triggered most of the questions.

I've now uploaded my slides (and added some screenshots of the demo prototypes), in case you're interested in this stuff or wonder what I would have talked about, or if you didn't see the demos after the session.

This week, the first "Microblogging Conference Europe" will take place in Hamburg. I was lucky to get a late ticket (thanks to Dirk Olbertz, who won't be able to make it). The conference will have barcamp-style tracks, and (narrow-minded as I am) I started thinking about adding SemWeb power to microblogging.

The more I use Twitter and advanced clients like TweetDeck, the more I think that (slightly enhanced) microblogs could become great interfaces to the (personalized) Semantic Web. I'm already noticing that I don't use a feed reader or delicious to discover relevant content any more. I'm effectively saving time. But simultaneously it becomes obvious that Twitter can be a distracting productivity killer. So, here is the idea: Take all the good things from microblogging and add enough semantics to increase productivity again. And while at it, utilize this semantic microblog as a work/life/idea log.

A semantic microblog would simplify the creation of structured, machine-readable information, in part for personal use, and generally to let the computer take care of certain tasks or do things that I didn't think of yet.

I have only two days left to prepare a demo and a talk, so I better start developing. I'll keep the rest of this post short and log my progress on Twitter instead. The app will be called "smesher". I'm starting now (or rather tomorrow morning, have to leave in 15 mins).

Use cases

How much time did I spend doing support this month?

Who are my real contacts (evidence-driven, please, why do I have to manually add followees)?

Show me a complete history of activities related to project A

How much can I bill client B? (or even better: Generate an invoice for client B)

What was that great Tapas Bar we went to last summer again?

Where did I first meet C?

Bookmarks ranked by number of occurrences in other tweets

Show me all my blog posts about topic D

...

Microblogs: Strengths

Microblogs are web-based

Microblogs are very easy to use ("less is more")

Microblogs offer a great communication channel (asynchronous, but almost instant)

Related Work

James Tizard just released a first version of sμBlog, a system "to create 'multi-purpose' messages that, depending on content and context, would be understood, and acted upon, as notes to myself, messages to other people, blog posts and so on"

Knowee is a web address book that lets you integrate distributed social graph fragments. A new version is online at knowee.net.

Heh, this was planned as a one-week hack but somehow turned into a full re-write that took the complete December. Yesterday, I finally managed to tame the semantic bot army and today I've added a basic RDF editor. A sponsored version is now online at knowee.net, a code bundle for self-hosting will be made available at knowee.org tomorrow.

What is Knowee?

Knowee started as a SWEO project. Given the insane number of online social networks we all joined, together with the increasing amount of machine-readable "social data" sources, we dreamed of a distributed address book, where the owner doesn't have to manually maintain contact data, but instead simply subscribes to remote sources. The address book could then update itself automatically. And -in full SemWeb spirit- you'd get access to your consolidated social graph for re-purposing. There are several open-source projects in this area, most notably NoseRub and DiSo. Knowee is aiming at interoperability with these solutions.

Ingredients

For a webby address book, we need to pick some data formats, vocabularies, data exchange mechanisms, and the general app infrastructure:

PHP + MySQL: Knowee is based on the ubiquitous LAMP stack. It tries to keep things simple, you don't need system-level access for third-party components or cron jobs.

FOAF, OpenSocial, microformats, Feeds: FOAF is the leading RDF vocabulary for social information. Feeds (RSS, Atom) are the lowest common denominator for exchanging non-static information. OpenSocial and microformats are more than just schemas, but the respective communities maintain very handy term sets, too. Knowee uses equivalentrepresentations in RDF.

I'm still working on a solution for access control, the current Knowee version is limited to public data and simple, password-based access restrictions. OAuth is surely worth a look, although Knowee's use case is a little different and may be fine with just OpenID + sessions. Another option could be the impressive FOAF+SSL proposal, I'm not sure if they'll manage to provide a pure-PHP implementation for non-SSL-enabled hosts, though.

Features / Getting Started

This is a quick walk-through to introduce the current version.

Login / Signup

Log in with your (ideally non-XRDS) OpenID and pick a user name.

Account setup

Knowee only supports a few services so far. Adding new ones is not hard, though. You can enable the SG API to auto-discover additional accounts. Hit "Proceed" when you're done.

Profile setup

You can specify whether to make (parts of) your consolidated profile public or not. During the initial setup process, this screen will be almost empty, you can check back later when the semantic bots have done their job. Hit "Proceed".

Dashboard

The Dashboard shows your personal activity stream (later versions may include your contacts' activities, too), system information and a couple of shortcuts.

Contacts

The contact editor is still work in progress. So far, you can filter the list, add new entries, and edit existing contacts. The RDF editor is still pretty basic (Changes will be saved to a separate RDF graph, but deleted/changed fields may re-appear after synchronization. This needs more work.) The editor is schema-based and supports the vocabularies mentioned above. You'll be able to create your own fields at some later stage.

It's already possible to import FOAF profiles. Knowee will try to consolidate imported contacts so that you can add data from multiple sources, but then edit the information via a single form. The bot processor is extensible, we'll be able to add additional consolidators at run-time, it only looks at "owl:sameAs" at the moment.

Enabling the SPARQL API

In the "Settings" section you'll find a form that lets you activate a personal SPARQL API. You can enable/protect read and/or write operations. The SPARQL endpoint provides low-level access to all your data, allows you to explore your social graph, or lets you create backups of your activity stream.

That's more or less it for this version. You can always reset or delete your account, and manually delete incorrectly monitored graphs. The knowee.net system is running on the GoGrid cloud, but I'm still tuning things to let the underlying RDF CMS make better use of the multi-server setup. If things go wrong, blame me, not them. Caching is not fully in place yet, and I've limited the installation to 100 accounts. Give it a try, I'd be happy about feedback.

I've been semi-silently working on something new. A combination of many semwebby things I came across and played with during the last 3 years or so:

semantic markup

smart data

an rdf clipboard

ajax

sparql sparql sparql

sparql + scripting

sparql + templates

sparql + widgets

lightweight, federated semweb services and bots

UIs for open data

semwikis

agile and collaborative web development

So, what happens when you put this all together? At least something interesting, and perhaps semsol's first commercial service. (Or product, this is all just LAMP stuff and can easily be run in an intranet or on a hosted server). Anyway, still some way to go. It's called paggr, the landing page is up, and today I created a first teaser/intro video.

I'll demo the beta (launch planned for November) at upcoming ISWC during the poster session (my poster is about SPARQL+ and SPARQLScript, the two SPARQL extensions that paggr is based on). I may have early invites by then.

As a preparation for the hopefully busy fall and winter months, though, I'll be on vacation for the next two weeks. No Email, no Web, no Phone. Yay!

SPARQLBot, the weekend project we started at SemanticCamp London, is now finally online at a proper home, and with a more solid toolset. I've ported the essential commands from the old site, and the "Getting Started" manual should be online later today as well.

What is SPARQLBot?

SPARQLBot is a web-based service that reads and writes Semantic Web data based on simple, human-friendly commands received via IRC or the Web. The command base can be freely extended using a browser-based editor. SPARQLBot can process microformats, RSS, several RDF serializations, and results from parameterized SPARQL queries.

New Features

SPARQLBot was more or less rewritten from scratch. Compared to the earlier version, things have become much more powerful, but also more simple and stable in many cases. The system can now:

SPARQLScript can be used for forward chaining, including string manipulations on the run.

In order to keep data structures in Semantic CrunchBase close to the source API, I used a 1-to-1 mapping between CrunchBase JSON keys and RDF terms (with only a few exceptions). This was helpful for people knowing the JSON API, but it wasn't easy to interlink the converted information with existing SemWeb data such as FOAF, or the various LOD sources.

A first version ("LOD Linker") is installed on Semantic CB, with initially 9 rules (feel free to leave a comment here if you need some additional mappings). With SPARQLScript being a superset of SPARQL+, most inference scripts are not much more than a single INSERT + CONSTRUCT query (you can click on the form's "Show inference scripts" button to see the source code):

(I'm using a similar script to generate foaf:name triples by concatenating cb:first_name and cb:last_name.)

Inferred triples are added to a graph directly associated with the script. Apart from a destructive rule that removes all email addresses, the reasoning can easily be undone again by running a single DELETE query against the inferred graph.

I'm quite happy with the functionality so far. What's still missing is a way to rewrite bnodes, I don't think that's already possible. But INSERT + CONSTRUCT will leave bnode IDs unchanged, so the inference scripts don't necessarily require URI-denoted resources.

Another cool aspect of SPARQLScript-based inferencing is the possibility to use a federated set of endpoints, each processing only a part of a rule. The initial DBPedia mapper above, for example, uses locally available wikipedia links. However, CrunchBase only provides very few of those. So I created a second script which can retrieve DBPedia identifiers for local company homepages, using a combination of local queries and remote ones against the DBPedia SPARQL endpoint (in small iterations and only for companies with at least one employee, but it works).

I've probably read Getting Real half a dozen times since the release of the free online version last year. The agile process seems to fit quite nicely with RDF-based tools (Semantic CrunchBase was the most recent proof of concept for me). I'm currently writing a DevX article about using RDF and SPARQL in combination with Getting Real and wondered about quantitative numbers for such an approach. As I usually don't record hours for personal projects, I had to create a new one: sillily named "dooit", a to-do list manager.

dooit follows a lot of GR suggestions such as "UI first", not wasting too much time on a name, that less may be enough for 80% of the use cases, or that usage patterns may evolve as "just-as-good" replacements of features ("mm-dd" tags could for example enable calendar-like functionality).

I started the live experiment on Friday and finished the first iteration on Saturday. Below is a twitter log of the individual activities. I was using Trice as a Web framework, otherwise I would of course have spent much more time on generating forms and implementing AJAX handlers etc. So, the numbers only reflect the project-specific effort, but that's what I was interested in.

I think I can call it a success so far. One point about GR is staying focused, working from the UI to the code helps a lot here (as does live-logging, I guess ;). But I'm not done yet. Now that I have a first running version, I still have to see if my RDF-driven app can evolve, if the code is manageable and easy to change. I'm looking forward to finding that out, but my shiny new dooit list suggests to finish the DevX article first ;)

Semantic CrunchBase seems to be worth the time I'm putting into it. Thanks to TechCrunch's and CrunchBase' great move to open their data and encourage reuse (and writingabout the apps that use their API), I've had the chance to do a couple of SemWeb demos and reach out to the audience that could benefit as much (or maybe even more) from RDF & Co. as the groups we already have on board: Web app developers.

I also got an offer to write some related articles for DevX, and the CrunchBase team just published an interview where I (shamelessly) promote SemWeb development. I am already noticing an increased number of mails asking for RDF introductions, and people are even starting to just figure things out on their own, with friendly SPARQL paving the path.

This might be the right time for a SWEO II (with a focus on the "E") or a similar effort driven by the RDF community.