Parsing webmentions

September 15th, 2013

Basically, it’s an equivalent to pingback. Let’s say I write something here on adactio.com. Suppose that prompts you to write something in response on your own site. A web mention is a way for you to let me know that your response exists.

If you look in the head of any of my journal posts, you’ll see this link element:

<link rel="webmention" href="http://adactio.com/webmention.php" />

That’s my web mention endpoint: http://adactio.com/webmention.php …it’s kind of like a webhook: a URL that’s intended to be hit by machines rather than people. So when you publish your response to my post, you ping that URL with a POST request that sends two parameters:

target: the URL of my post and

source: the URL of your response.

Ideally your own CMS or blogging system would take care of doing the pinging, but until that’s more widely implemented, I’m providing this form at the end of each of my posts:

Either way, once you ping my web mention endpoint—discoverable through that link rel="webmention"—with those two parameters, I just need to confirm that your post does indeed contain a link to my post—by making a cURL request and parsing your source—and then I return a server response of 202 (Accepted).

That’s as far as I got at Indie Web Camp but it was enough for me to start collecting responses to posts.

The next step is to do something with the responses. After all, I’ve already got the source of each response from those cURL requests.

Barnaby has a written a nice straightforward microformats parser in PHP. I’m using that to check the cURLed source for any responses that have been marked up using h-entry. That’s one of the microformats 2 vocabularies—a much simpler way of writing structured content with microformats.

So there you have it. Comments are now open on every journal post on adactio.com …the only catch is that you have to write the comment on your own site. And if you want the content of your post to appear here (instead of just a link) then update your blog post template to include a handful of h-entry classes.

Feel free to use this post as a test. Mark up your blog with h-entry, write a post that links to this URL, and enter the URL of your post in the form below.

Responses

"Parsing Webmentions" by @adactio: http://adactio.com/journal/6495/ A step-by-step explanation of how to receive #webmentions and incremental steps you can take afterwards like: * displaying links to posts that mention yours * displaying such posts in their entirety with attribution Well done Jeremy.

Jeremy has recently implemented Webmention on adactio.com, and posted an explanation of the small piece of code involved. I I love the simplicity of Webmention, and I love the Indieweb idea of connecting our conversations in the simplest possible way whilst still publishing to our own sites, owning our data. I intend to implement it on this site as soon as I can: both to test it out, and to offer a way of commenting without all the hassle of actually managing comments (sort of).

In 2007 and 2011 I wrote a pair of articles where I tried to articulate my thoughts on what I saw as the dawning social web. The earliest article was about Google’s OpenSocial API, and the latter, about what I saw as Google’s new social network, built around profiles, rel=me and Buzz.

Neither came to pass. OpenSocial died a slow, painful death, while Buzz was kicked in the teeth, then taken out back and shot. But, what they demonstrated was Google’s attempts at socialising the web itself. Of course, this dream died when Google+ was released as a Facebook-clone and Google returned to only being an advertising company.

One of the latest pieces of work developed by Jeremy Keith is webmentions.

Webmentions

Webmentions allows conversations, that would usually occur in comments, to take place on your own website. Pinging back responses to the original article allows readers to follow the discussion of articles and comments, while the content itself continues to be hosted on the respondent’s website and owned by them - fulfilling the IndieWeb principles.

Google Buzz performed this kind of aggregation and connecting. It allowed conversations to happen across the web and be followed in one place.

Webmentions picks up where Buzz left off and adds the ability to host the conversation itself to the mix.

@sandeepshetty@pfefferlecweiske Something new to consider: Jeremy Keith added a webmention sending form to his journal entries to help people who’s websites don’t support webmention already. Being able to test and use webmention through a human visible, interactable form is a huge benefit of using HTTP form encoded data.

We can make this an even stronger case by encouraging success and error responses to be full HTML documents with helpful copy.

This site does not have a way for readers to post comments under each article. And I plan not to implement a comment section. Instead, now Parallel Transport accepts webmentions, so you can write and publish responses on your own place on the web and link it back to the original article here. In fact this very article appears as a response on Jeremy Keith’s post.

Comments & Feedback
Most comments are useless. They do not contribute much to the original content. They’re mostly one-line quips about liking or disliking the post. No context, no feedback, no thought. Having a comment form makes it very easy to post a random comment on some website. Combine that with the anonymity it affords, comment sections are breeding grounds for trolls, unconstructive arguments, flaming, name-calling, and shouting matches. No wonder the bottom-half of the internet is so despised.
Second; why this expectation that every blog must be a discussion forum? Public discussions can be had on social platforms like Facebook, Twitter, Google+.
If you really have some response to what I write, you can use email or catch me on one of the aforementioned social site places. Or better yet, write your own post about it on your own blog, site, journal, social-network-thingie!

You write on your site; I write on mine. That’s a response.

—John Gruber on I’ll Tell You What’s FairIf your response, adds something of value to the original content, I’ll post it here. I have transfered many such valued comments over from my old blog. Not all of them agreed with what I had to say, but they were considered responses, instead of just ‘I like it!’ or ‘This is shit!’.
Webmention
So suppose you have published a response on your own site. Webmention is a way to notify me of your response. You send the URL of your response along with the URL of my original article as a POST request to my web server. My server verifies that the response post exists and that it links to the original article.
Now, I can do something with your response. I could simply link to it below my article, or repost your entire response. I have chosen to take a middle-ground and display a little snippet with a nice link back to your original post. If all goes well, you would have responded to my article with one of your own, published on your own site, and still have a discussion that links back and forth.
Since, most web-publishing platforms don’t support sending mentions automatically, following Jeremy Keith, I have a small form at the bottom of every article that you can use to send me the link to your response post.
If you are interested in the details of how this works, take a look at the webmention spec, and the informative discussion at the IndieWeb. If, like me, you use and love Python, Panayotis Vryonis has written a good tool to handle webmentions. You can also see my own webmention code, which extends Vryonis’ webmentiontools to do a few more things. Feel free to make suggestions, test, extend and use it. And if you do, let me know. Or better yet, write about it and link it back here!

In a world before social media, a lot of online communities existed around blog comments. The particular community I was part of – web standards – was all built up around the personal websites of those involved.

As social media sites gained traction, those communities moved away from blog commenting systems. Instead of reacting to a post underneath the post, most people will now react with a URL someplace else. That might be a tweet, a Reddit post, a Facebook emission, basically anywhere that combines an audience with the ability to comment on a URL.

Oh man, the memories of dynamic text replacement and the lengths we went to just to get some non-standard text. https://t.co/f0whYW6hh1

Whether you think that’s a good thing or not isn’t really worth debating – it’s just the way it is now, things change, no big deal. However, something valuable that has been lost is the ability to see others’ reactions when viewing a post. Comments from others can add so much to a post, and that overview is lost when the comments exist elsewhere.

This is what webmentions do

Webmention is a W3C Recommendation that solves a big part of this. It describes a system for one site to notify another when it links to it. It’s similar in concept to Pingback for those who remember that, just with all the lessons learned from Pingback informing the design.

The flow goes something like this.

Frankie posts a blog entry.

Alex has thoughts in response, so also posts a blog entry linking to Frankie’s.

Alex’s publishing software finds the link and fetches Frankie’s post, finding the URL of Frankie’s Webmention endpoint in the document.

Alex’s software sends a notification to the endpoint.

Frankie’s software then fetches Alex’s post to verify that it really does link back, and then chooses how to display the reaction alongside Frankie’s post.

The end result is that by being notified of the external reaction, the publisher is able to aggregate those reactions and collect them together with the original content.

The reactions can be comments, but also likes or reposts, which is quite a nice touch. For the nuts and bolts of how that works, Jeremy explains it better than I could.

Beyond blogs

Not two minutes ago was I talking about the reactions occurring in places other than blogs, so what about that, hotshot? It would be totally possible for services like Twitter and Facebook to implement Webmention themselves, in the meantime there are services like Bridgy that can act as a proxy for you. They’ll monitor your social feed and then send corresponding webmentions as required. Nice, right?

Challenges

I’ve been implementing Webmention for the Perch Blog add-on, which has by and large been straightforward. For sending webmentions, I was able to make use of Aaron Parecki’s PHP client, but the process for receiving mentions is very much implementation-specific so you’re on your own when it comes to how to actually deal with an incoming mention.

Keeping it asynchronous

In order for your mention endpoint not to be a vector for a DoS attack, the spec highly recommends that you make processing of incoming mentions asynchronous. I believe this was a lesson learned from Pingback.

In practise that means doing as little work as possible when receiving the mention, just minimally validating it and adding it to a job queue. Then you’d have another worker pick up and process those jobs at a rate you control.

In Perch we have a central task scheduler, so that’s fine for this purpose. My job queue is a basic MySQL database table, and I have a scheduled task to pick up the next job and process it once a minute.

I work in publishing, dhaaaling

Another issue that popped up for me in Perch was that we didn’t have any sort of post published event I could hook into for sending webmentions out to any URLs we link to. Blog posts have a publish status (usually draft or published in 99% of cases) but they also have a publish date which is dynamically filtered to make posts visible when the date is reached.

If we sent our outgoing webmentions as soon as a post was marked as published, it still might not be visible on the site due to the date filter, causing the process to fail.

The solution was to go back to the task scheduler and again run a task to find newly published posts and fire off a publish event. This is an API event that any other add-on can listen for, so opens up options for us to do this like auto-tweeting of blog posts in the future.

Updating reactions

A massive improvement of webmentions over most commenting systems is the affordance in the spec for updating a reaction. If you change a post, your software will re-notify the URLs you link to, sending out more webmention notifications.

A naive implementation would then pull in duplicate content, so it’s important to understand this process and know how to deal with updating (or removing) a reaction when a duplicate notification comes along. For us, that meant also thinking carefully about the moderation logic to try to do the right thing around deciding which content should be re-moderated when it changes.

Finding the target

One interesting problem I hit in my endpoint code was trying to figure out which blog post was being reacted to when a mention was received. The mention includes a source URL (the thing linking to you) and a target URL (the URL on your site they link to) which in many cases should be enough.

For Perch, we don’t actually know what content you’re displaying on any given URL. It’s a completely flexible system where the CMS doesn’t try to impose a structure on your site – you build the pages you want and pull out the content you want onto those pages. From the URL alone, we can’t tell what content is being displayed.

This required going back to the spec and confirming two things:

The endpoint advertised with a post is scoped to that one URL. i.e. this is the endpoint that should be used for reacting to content on this page. If it’s another page, you should check that page for its endpoint.

If an endpoint URL has query string parameters, those must be preserved.

The combination of those two factors means that I can provide an endpoint URL that has the ID of the post built into it. When a mention comes in, I don’t need to look at the target but instead the endpoint URL itself.

It’s possible that Bridgy might not be compliant with the spec on this point, so it’s something I’m actively testing on this blog first.

Comments disabled

With that, after about fifteen years of having them enabled, I’ve disabled comments on this blog. I’m still displaying all the old comments, of course, but for the moment at least I’m only accepting reactions via webmentions.

The blog isn’t dead. It is just sleeping. December 19, 2013 Jason Kottke, writing for Nieman Journalism Lab: The design metaphor at the heart of the blog format is on the wane as well. Ina piece at The Atlantic, Alexis Madrigal says that the reverse-chronological stream (a.k.a. The Stream, a.k.a. The River of News) is on its way out. Snapchat, with its ephemeral media, is an obvious non-stream app; Madrigal calls it “a passing fog.” Facebook’s News Feed is increasingly organized by importance, not chronology. Pinterest, Digg, and an increasing number of other sites use grid layouts to present information. Twitter is coming to resemble radio news as media outlets repost the same stories throughout the day, ICYMI (in case you missed it). Reddit orders stories by score. The design of BuzzFeed’s front page barely matters because most of their traffic comes in from elsewhere. I suggest you read the entire post so that you can see how Kottke has reached the conclusion that the blog is dead. And of course, he’s right. The blog of today looks dead. But don’t bury it just yet because it may just be sleeping. Me, in late-2011: I believe the blog format is ready for disruption. Perhaps there doesn’t need to be “the next” WordPress, Tumblr, or Blogger for this to happen. Maybe all we really need is a few pioneers to spearhead an effort to change the way blogs are laid-out on the screen. That, of course, is only one small problem facing the blog. As I see it there is another, more important, problem to solve; a way to connect the blogosphere. A set of protocols or standards will need to come along to help connect all publishing platforms together. The incredibly useful features we find inside of networks like Twitter will need to find their way out onto the world wide web. This means bringing actions like following or subscribing, mentioning, citing, link previewing, etc. to the independent web and have them be completely separate from any single service. By the way, “independent web” generally refers to the web at large regardless of how you choose to publish content on it. Whether you use Barley CMS, Tumblr, Squarespace, or your own hand-written content management system you’re publishing onto the web and not into a silo like Facebook where content is generally not shared outside of its walls. Connecting the independent web together is what IndieWebCamp is aspiring to help facilitate. They believe people should own their own data and be allowed to publish content anywhere and it would then be able to be distributed anywhere. The advantages to using Facebook should be brought out onto the web. There should be no real disadvantage to using one platform or another. In fact, there should be an advantage to using your own platform rather than those of a startup that could go out of business at any moment. A good example of this in action are Web Mentions. Like Twitter’s @replies a Web Mention allows one URL to notify another URL that it mentioned it. They are the 21st Century’s Pingback. Jeremy Keith has a good explanation of how to implement them. Would having a better way to discover a blog’s content from any of its pages, as well as a well-supported set of web protocols help bring the blog back from the dead? Maybe. View all posts

My motivations for receiving Webmentions are to participate in IndieWeb and also to reduce my third-party tracking. I’d been relying on webmention.io to get started quickly, but they’re also a third-party. By removing Disqus and replacing with self-hosted Webmentions, I have officially eliminated all 3rd party requests. Huzzah!

My implementation

The nuts and bolts are often specific to your existing tech stack, but mine is a pretty common one for frontend developers: statically generated site, served by Express. I thought about going with a Jekyll plugin like Lewis, but in the end I decided to add a PostgreSQL DB to manage the Webmentions themselves, and manage them separately from my static site generator. I may want to switch generators in the future, and it’s one less piece that has to be rebuilt at the time of migration.

Since I’m already using Express to serve my site, adding new endpoints is a snap. I added two routes: webmentions/get and webmentions/post. I suppose they could be combined into one but I’m a complete newbie to API design so maybe chime in below and tell me if my setup is completely daft 😁

Endpoint for Submissions

Perhaps in the future I can get more nuanced with my logic, but for now my server can determine the following outcomes when you POST to the endpoint:

202 — when the proper Header (application/x-www-form-urlencoded), plus a target and source are present in the Body of the request. Per the guidelines, it immediately returns the status code and does the actual processing of the submission asynchronously, as opposed to waiting until URL fetch, HTML parsing, and DB writes are finished.

400 — when the request was well-formed, but there was a problem with the data. Most often it’s because the target URL (my website) wasn’t found within the HTML response of the source URL.

500 — obviously sometimes things just go wrong. If you’re using the form on my site, it will at least take responsibility for the error to avoid making visitors feel as if they’ve made a mistake.

In the interest of shipping a first version and not introducing too much complexity or opportunities for XSS, I’m stripping HTML and only displaying plaintext versions of entries which link to my site. I relied on Glenn Jones’ microformat-node to parse URLs whose HTML contains h-entry. The library provides both structured and plaintext results so it made things safer while I get started. My CSP should handle many attacks, but better safe than sorry.

In the future I’d like to move to a richer format, preserving some markup or even allowing someone to choose what is displayed on my site (choose between: summary, trimmed e-content, or title).

To test my endpoint I used the ever-useful Postman, which let me quickly assemble, edit, and save various POST requests to help me test the robustness of my server-side code.

Listing my Webmentions

My GET endpoint simply returns an array of Webmentions for a given target. I use Jekyll to generate my site, and my include for webmentions contains the following data attribute: