Category: Development

I’m currently working on the Vaani project at Mozilla, and part of my work on that allows me to do some exploration around the topic of speech recognition and speech assistants. After looking at some of the commercial offerings available, I thought that if we were going to do some kind of add-on API, we’d be best off aping the Amazon Alexa skills JS API. Amazon Echo appears to be doing quite well and people have written a number of skills with their API. There isn’t really any alternative right now, but I actually happen to think their API is quite well thought out and concise, and maps well to the sort of data structures you need to do reliable speech recognition.

So skipping forward a bit, I decided to prototype with Node.js and some existing open source projects to implement an offline version of the Alexa skills JS API. Today it’s gotten to the point where it’s actually usable (for certain values of usable) and I’ve just spent the last 5 minutes asking it to tell me Knock-Knock jokes, so rather than waste any more time on that, I thought I’d write this about it instead. If you want to try it out, check out this repository and run npm install in the usual way. You’ll need pocketsphinx installed for that to succeed (install sphinxbase and pocketsphinx from github), and you’ll need espeak installed and some skills for it to do anything interesting, so check out the Alexa sample skills and sym-link the ‘samples‘ directory as a directory called ‘skills‘ in your ferris checkout directory. After that, just run the included example file with node and talk to it via your default recording device (hint: say ‘launch wise guy‘).

Hopefully someone else finds this useful – I’ll be using this as a base to prototype further voice experiments, and I’ll likely be extending the Alexa API further in non-standard ways. What was quite neat about all this was just how easy it all was. The Alexa API is extremely well documented, Node.js is also extremely well documented and just as easy to use, and there are tons of libraries (of varying quality…) to do what you need to do. The only real stumbling block was pocketsphinx’s lack of documentation (there’s no documentation at all for the Node bindings and the C API documentation is pretty sparse, to say the least), but thankfully other members of my team are much more familiar with this codebase than I am and I could lean on them for support.

I’m reasonably impressed with the state of lightweight open source voice recognition. This is easily good enough to be useful if you can limit the scope of what you need to recognise, and I find the Alexa API is a great way of doing that. I’d be interested to know how close the internal implementation is to how I’ve gone about it if anyone has that insider knowledge.

Wow, so it’s been over a year since I last blogged. Lots has happened in that time, but I suppose that’s a subject for another post. I’d like to write a bit about something I’ve been working on for the last week or so. You may have seen Google’s proposal for navigation transitions, and if not, I suggest reading the spec and watching the demonstration. This is something that I’ve thought about for a while previously, but never put into words. After reading Google’s proposal, I fear that it’s quite complex both to implement and to author, so this pushed me both to document my idea, and to implement a proof-of-concept.

I think Google’s proposal is based on Android’s Activity Transitions, and due to Android UI’s very different display model, I don’t think this maps well to the web. Just my opinion though, and I’d be interested in hearing peoples’ thoughts. What follows is my alternative proposal. If you like, you can just jump straight to a demo, or view the source. Note that the demo currently only works in Gecko-based browsers – this is mostly because I suck, but also because other browsers have slightly inscrutable behaviour when it comes to adding stylesheets to a document. This is likely fixable, patches are most welcome.

Navigation Transitions specification proposal

Abstract

An API will be suggested that will allow transitions to be performed between page navigations, requiring only CSS. It is intended for the API to be flexible enough to allow for animations on different pages to be performed in synchronisation, and for particular transition state to be selected on without it being necessary to interject with JavaScript.

Proposed API

Navigation transitions will be specified within a specialised stylesheet. These stylesheets will be included in the document as new link rel types. Transitions can be specified for entering and exiting the document. When the document is ready to transition, these stylesheets will be applied for the specified duration, after which they will stop applying.

Example syntax:

1

2

<link rel="transition-enter"duration="0.25s"href="URI" />

<link rel="transition-exit"duration="0.25s"href="URI" />

When navigating to a new page, the current page’s ‘transition-exit‘ stylesheet will be referenced, and the new page’s ‘transition-enter‘ stylesheet will be referenced.

When navigation is operating in a backwards direction, by the user pressing the back button in browser chrome, or when initiated from JavaScript via manipulation of the location or history objects, animations will be run in reverse. That is, the current page’s ‘transition-enter‘ stylesheet will be referenced, and animations will run in reverse, and the old page’s ‘transition-exit‘ stylesheet will be referenced, and those animations also run in reverse.

[Update]

Anne van Kesteren suggests that forcing this to be a separate stylesheet and putting the duration information in the tag is not desirable, and that it would be nicer to expose this as a media query, with the duration information available in an @-rule. Something like this:

1

2

3

4

5

6

7

@viewport {

navigate-away-duration:500ms;

}

@media (navigate-away) {

...

}

I think this would indeed be nicer, though I think the exact naming might need some work.

Transitioning

When a navigation is initiated, the old page will stay at its current position and the new page will be overlaid over the old page, but hidden. Once the new page has finished loading it will be unhidden, the old page’s ‘transition-exit‘ stylesheet will be applied and the new page’s ‘transition-enter’ stylesheet will be applied, for the specified durations of each stylesheet.

When navigating backwards, the CSS animations timeline will be reversed. This will have the effect of modifying the meaning of animation-direction like so:

1

2

3

4

5

6

Forwards | Backwards

--------------------------------------

normal | reverse

reverse | normal

alternate | alternate-reverse

alternate-reverse | alternate

and this will also alter the start time of the animation, depending on the declared total duration of the transition. For example, if a navigation stylesheet is declared to last 0.5s and an animation has a duration of 0.25s, when navigating backwards, that animation will effectively have an animation-delay of 0.25s and run in reverse. Similarly, if it already had an animation-delay of 0.1s, the animation-delay going backwards would become 0.15s, to reflect the time when the animation would have ended.

Layer ordering will also be reversed when navigating backwards, that is, the page being navigated from will appear on top of the page being navigated backwards to.

Signals

When a transition starts, a ‘navigation-transition-start‘ NavigationTransitionEvent will be fired on the destination page. When this event is fired, the document will have had the applicable stylesheet applied and it will be visible, but will not yet have been painted on the screen since the stylesheet was applied. When the navigation transition duration is met, a ‘navigation-transition-end‘ will be fired on the destination page. These signals can be used, amongst other things, to tidy up state and to initialise state. They can also be used to modify the DOM before the transition begins, allowing for customising the transition based on request data.

JavaScript execution could potentially cause a navigation transition to run indefinitely, it is left to the user agent’s general purpose JavaScript hang detection to mitigate this circumstance.

Considerations and limitations

Navigation transitions will not be applied if the new page does not finish loading within 1.5 seconds of its first paint. This can be mitigated by pre-loading documents, or by the use of service workers.

Stylesheet application duration will be timed from the first render after the stylesheets are applied. This should either synchronise exactly with CSS animation/transition timing, or it should be longer, but it should never be shorter.

Authors should be aware that using transitions will temporarily increase the memory footprint of their application during transitions. This can be mitigated by clear separation of UI and data, and/or by using JavaScript to manipulate the document and state when navigating to avoid keeping unused resources alive.

Navigation transitions will only be applied if both the navigating document has an exit transition and the target document has an enter transition. Similarly, when navigating backwards, the navigating document must have an enter transition and the target document must have an exit transition. Both documents must be on the same origin, or transitions will not apply. The exception to these rules is the first document load of the navigator. In this case, the enter transition will apply if all prior considerations are met.

Default transitions

It is possible for the user agent to specify default transitions, so that navigation within a particular origin will always include navigation transitions unless they are explicitly disabled by that origin. This can be done by specifying navigation transition stylesheets with no href attribute, or that have an empty href attribute.

Note that specifying default transitions in all situations may not be desirable due to the differing loading characteristics of pages on the web at large.

It is suggested that default transition stylesheets may be specified by extending the iframe element with custom ‘default-transition-enter‘ and ‘default-transition-exit‘ attributes.

I believe that this proposal is easier to understand and use for simpler transitions than Google’s, however it becomes harder to express animations where one element is transitioning to a new position/size in a new page, and it’s also impossible to interleave contents between the two pages (as the pages will always draw separately, in the predefined order). I don’t believe this last limitation is a big issue, however, and I don’t think the cognitive load required to craft such a transition is considerably higher. In fact, you can see it demonstrated by visiting this link in a Gecko-based browser (recommended viewing in responsive design mode Ctrl+Shift+m).

I would love to hear peoples’ thoughts on this. Am I actually just totally wrong, and Google’s proposal is superior? Are there huge limitations in this proposal that I’ve not considered? Are there security implications I’ve not considered? It’s highly likely that parts of all of these are true and I’d love to hear why. You can view the source for the examples in your browser’s developer tools, but if you’d like a way to check it out more easily and suggest changes, you can also view the git source repository.