…is a writer, lapsed academic, and web developer based in Montreal, Canada.

To be notified of updates and additions make sure to add the feed to your favourite feed reader or subscribe to the newsletter. (Email)

4 February 2012

The semantics of ebook widgets

Over the past few days I’ve had several interesting conversations on ebooks, interactivity, widgets, standardisation, and other issues that have cropped up as a result of Apple forking the ePub3 format.

Most of them have been people making very good points that have forced me to clarify my thoughts and reconsider some of my ideas.

One such conversation was the following email exchange with Grant Sutherland (posted with his permission).

We didn’t quite manage to convince each other, but I think we each made a pretty good case for our respective approaches for the discussion to be useful and educational to others, no matter which side you take.

Both of our respective principles and approaches to developing more interactive ebooks are, I think, much healthier than the one-sided and opaque approach Apple has decided to take.

From Grant:

Hi Baldur,

I’m a writer, currently published by Macmillan.

I’ve been interested in the development of ebooks for some while, and have been enjoying your posts.

In relation to your recent exchange with Joseph Pearson, I agree with you both about the widgets being authored declaratively (with bindings).

However I strongly disagree that microformats are the appropriate solution for books (however useful they might be on the web). The ‘vocabularies’ extension in epub3 allows for any numbers of xml dialects to be dropped into the basic xhtml5. Why not use them? They’ve already been developed rigorously (e.g. ChemML, KML) and something like TEI seems to me ideally suited as a basis for interactivity/enhancement in a variety of non-fiction ebooks. Microformats are meant to evolve towards useful common standards; but in this case, the common standards already exist.

That’s a valid point and it’s one I debated a lot when I was an academic studying and authoring interactive projects. Generally speaking, only a minority thought it was a good idea to reuse these vocabularies because they simple weren’t designed for our purposes (creating interactive texts). I was a part of that minority in favour of reuse. I changed my mind as soon as I gained more practical experience in authoring interactive projects.

There’s only one, widely accepted, open format that suits our purposes: HTML+CSS+JS. There hasn’t been a need to create a new format for interactive hypertexts (which is what we are talking about, since text is a subset of hypertext) because we already have one. The problem arises when we remove JS from that equation; we need simple methods of getting back some of the interactive functionality we’ve lost. None of the other pre-existing formats suit our purposes because we’ve just given up on the only one that does.(^1)

So, short answer: Because it’s not reinventing the wheel. Not really.

Most of those XML formats, like TEI, are both specialised and largely focus on meaning (that is, preserving a fidelity of the information it carries). But when you are creating an interactive project you need to explicitly declare your intent: this is how this element is supposed to behave. Using existing XML formats for this purpose can get extremely complex and awkward because it’s not what they were designed to do. Relying on a reading system to infer a behaviour based on semantics is an extremely limited way to author interactive ebooks.

(TEI is also bloody complicated.)

Microformats is a term that Joseph Pearson brought up and I’m not quite sure I agree with it being applicable in this circumstance, because microformats generally describe semantics and meaning and not behaviour and intent.

A simple OBJECT tag, configured by linking to something like a json file using the PARAM tag, is much easier to author than any of these pre-existing XML formats, and has the added benefit of being both explicit in terms of the behaviour that’s desired, and flexible in terms of how the reading system wants to implement it. It’s also something that’s very easy to implement in almost any programming language. None of these apply to a complex XML vocabulary, even if it really did what we need it to do.

A small set of simple but flexible widgets (behavioural objects) is very much outside the remit of any of the preexisting formats.

The only exception I can think of are SVG+SMIL which could cover a lot of bases but even that is still quite complicated both to author and implement (which is why support is so sparse and buggy, even among web browsers).

And even then SVG+SMIL doesn’t cover some of the basic behaviours we need.

I’d love to have it, though.

For authored interactivity we either need javascript (to implement and attach behaviour to semantic elements) or we need simple but configurable objects. Any other solution is too complex to author and implement, or simply doesn’t solve our problem.

For the record, I don’t think the iBooks widgets cover exactly our needs either. They are a starting point.

So, that’s my reasoning. I hope it makes some sort of sense :-)

best,
baldur

The footnote:

(^1): There are very good reasons for abandoning javascript in the context of ebooks. The biggest reason is security. An ebook is a much more persistent object than a web page. Turning an ebook into a fully-fledged javascript app platform opens readers up to exactly the same security issues as any other app platform, viruses, trojans, worms, etc. etc. It’s understandable that many ebook platform vendors might want to avoid that. In many cases javascript support may simply present substantial and expensive problems that are very hard to solve.

Consider for example iBooks, the only epub platform that supports javascript: It is buggy as hell and updates regularly break working code. It breaks away from established web behaviours in subtle and not so subtle ways that is complex to tackle. And it crashes. A lot. Especially when you are on an iPad 1.

Addendum. I do miss proper javascript support in ebooks. It has the merit of years of development and testing and phenomenal mindshare. But, I think it’s clear at this point that cross-platform javascript is a non-starter in the ebook realm. If that’s the case, we need to start thinking about alternatives.

From Grant:

Hi Baldur,

Thanks for the thoughtful reply. I take your well-made points.

I think your key statement is this one:

For authored interactivity we either (a) need javascript to implement and attach behaviour to semantic elements or we need (b) simple but configurable objects. Any other solution is too complex to author and implement, or simply doesn’t solve our problem.

My own belief is that certain classes of texts are well-suited to solution (a), whereas others - and I would guess that among these are the type of interactive texts you’re working on - are more suited to solution (b).

As an example of a text of type (a), take any classic history. By impregnating the text at appropriate points with the various <bio>,<time> and <geo> related tags of the TEI, a work like ‘Decline and Fall…’ could be gently enhanced by the reading system with a sidebarred timeline, map and bio reference. The primary purpose of this type of enhancement (and, in my view, probably the only useful kind of enhancement possible in this type of work) is navigational. It is clearly not a hugely ambitious aim, but it has the inestimable advantage, over the many ‘snake-oil’ type’enhancements we’ve lately seen, of actually being useful. Because this is the kind of text I tend to read, my view is admittedly skewed toward this solution.

(In a way, the Kindle’s X-Ray facility is something like this, but prone to error because it’s using text-mining to make the semantic enhancements. And it’s ugly.)

As to texts of type (b), I think you are absolutely right.

There is obviously a massive problem here with the stability of this type of book. And I think the lessons to be learned about ‘maintainability’ will be learned from the past experiences and current best-practices of programmers.

The big lesson I see is that ‘unstable’ is the natural state in this world. I’m reminded of the old joke in biology: ‘We biologists have a special word for obects in a stable state - dead.’

Publishers want ‘stable’, but I don’t think they’re ever going to get it in this area. The best they can hope for to manage the ‘instability’ of these books in a commercially viable manner.

My own guess is that books will come to exist on a (lumpy) spectrum of stability that looks something like this:

Plain-text, stored on clay tablets (the first, and still the best for longevity)

Plain-text semantically+programmaticaly enhanced, stored on computers. Behaviours (if any) determined by some combination of reading system and program embedded in text.

With regard to your remarks about the widgetization of behaviours, I agree that SVG+SMIL is the only sane long-term answer. In the meantime, here are a couple of links that might interest you (if you’re not aware of them already):

Simile uses json to populate a set of predetermined html/css/javascript widgets.
Lively-kernel is less straightforward, and more buggy. It’s widgets are primarily customizable javascript objects. But if you look here:

you’ll see that they use a similar mechanism to that <object> PARAM technique that you’re suggesting (but in their case from one highly configurable javascript object to another, rather than from json to html.

Hope some of this is of some use to you.

Regards, Grant

And me:

I’d like to start off by saying that I generally agree with you, in an ideal world semantic text combined with authored javascript would be the ideal solution for creating interactive texts. The only point I’d make is that in this circumstance the programmer becomes a co-author of the text (interactivity is an act of authorship) but those who can work in this context that embrace this and partner with a skilled programmer will end up creating remarkable things.

There is a subset of interactive patterns that can’t be tied into semantics since the behaviour is the primary carrier of meaning (understanding is derived form the actions the reader takes). In this case you have to author a non-semantic widget into the text. Texts full of this behaviour lend themselves to solution (b), as you call it. They are also, in my opinion, generally only appropriate to non-fiction; like you I’m skeptical of the benefits of ‘enhancing’ narrative text.

Then there’s the issue of supporting true interactive narratives, creating a platform that can support hypertext and non-linear stories.

In either case, creating these texts hasn’t been that much of a problem because we have had a tool at hand: Javascript.

The problem is that javascript in ebooks is looking like a non-starter at the moment. The only major platform that implements it is iBooks and even there support is buggy enough to make development substantially more difficult than developing a scripted web site. I think it’s pretty much a certainty at this point, based on the discussions and debates I’ve witnessed and participated in, that javascript won’t be universally supported in ebooks like it is on the web.

Which, really, is the origin of my conundrum.

On the stability of these texts:

Very true, and it’s a problem I and other academics who were researching and teaching in this field ten years ago have had to tackle again and again, and, believe it or not, what we have now is much better than it was.

Most of the interactive works I studied ten years ago and I cited in my research are unplayable now. Hypercard projects, Director-authored CD-ROMs, old flash files, and plain old executable programs, many of the big, influential, texts are now hard to access, locked in a dead platform (Mac Classic). So, this has been a subject of debate for decades now. There was a lot of worrying and scaremongering about the issue at the time, but it ended up not being as much of an issue as people expected.

Of course, some works have been lost. It’s hard to find and play a Voyager Expanded Book today. But the solution was articulated by Mark Bernstein (of Eastgate Systems, publisher of hypertext, maker of tools, etc.): The only thing that preserves works is interest.

When there is interest, someone will make sure that the work is available and accessible (see, for example the conversion and update of the “If Monks Had Macs” CD-ROM).

When there isn’t interest, no amount of open formats or standardisation will save the work from oblivion.

Of course, there are exceptions such as the rights situation with many of the Voyager Expanded Books, but that kind of legal bind won’t be solved by any amount of standardisation.

And, as I wrote, we’re in a much better place now than we were then. We have an open standard, ePub, which, even with books published on the Kindle, will remain an archival and authoring format.

Extending that format with a set of documented, standardised (even if just a de facto standard) objects will not threaten their archivability in any way, provided we do it in a sensible manner.

Which brings me to Apple’s new iBooks 2.0 format. That format runs the risk of instability and obsolescence. It’s undocumented, intentionally incompatible with the standard, extends it in odd ways. It’s a very problematic approach to creating interactive texts.

With javascript ruled out, we need to establish a path forward in ebook interactivity that isn’t dominated by the proprietary and exclusionary path Apple is taking. The only way I see of doing that is describing a set of simple, but configurable, interactive objects that can be combined and used to create fairly rich works.

Both the Simile Widgets and Lively Kernel are similar to what I think we need, but not quite. (In either case, projects like these are ruled out by the fact that javascript is a non-starter on many ebook platforms.) The Simile Widgets are a little bit too specialised and complex. Lively Kernel is too tightly integrated with javascript both conceptually and in implementation.

I think the next step for me would be to write up a description of the forms of interactivity I think are needed for ebooks (there are a few basic actions that can be combined to create 90% of what authors of interactive works need) followed by a list of widgets, with suggested format extensions, that would implement those forms of interactivity.

I wouldn’t be surprised if, with a little bit of thought, those of us who want this can come up with conventions that are closer to the semantic ideal than the hodgepodge of opaque data that Apple is using.

I think, for example, that epub:type can be put to effective use. One approach would be to create a vocabulary of interactivity for epub:type which might give us the best of both worlds—if we assume that an interactive act has meaning in and of itself, that is.

The analogy would be ‘footnote’, which is a name of a print design feature that has attained an added meaning derived from its common use. If we define a epub:type vocabulary for commonly used interactive design features, especially those that have attained some added meaning from repeated practice, then I think we would have something that is sustainable, usable, and would satisfy the qualms of most.

I’ve really enjoyed this exchange. :-) Would you mind if I posted this conversation on my website? I won’t do it if you’d rather I didn’t, but I think a lot of people would find it interesting to see an exchange that presented both sides equally like this.