Hyperaudio is a term coined by Henrik Moltke. After discovering we had similar interests we started working on the concept a while back, resulting in a few demos such as the Radiolab demo for WNYC. That was a couple of years ago now. Since then I have been working with my colleague Mark Panaghiston on various Hyperaudio demos and thinking about how we can create an suite of tools I imaginatively call the Hyperaudio Ecosystem, at the center of that ecosystem sits the Hyperaudio Pad.

What is Hyperaudio?

But hey, let’s back up a bit. What is this Hyperaudio of which I speak? Simply put, Hyperaudio aims to do for audio what hypertext did for text – that is to integrate audio fully into the web experience. I wrote more about that here.

The Hyperaudio Pad is a tool that enables people to build up or remix audio and video using the underlying timed-transcripts. At the moment we’re concentrating on transcribing the spoken word.

Over the last seven days or so me and the other Mark have been working fairly solidly on taking the Hyperaudio Pad to the next level. It’s great to work with Mark P for several reasons. Obviously as key author of jPlayer he knows pretty much all there is to know about web based media, but also although we are both developers we are in many ways diametrically opposed. Mark is keen on code quality and doing things properly and I just want to get things out there. Happily the compromise we arrive at when we work together is a usually a good one. Mark is coming around to minimal-viable solutions and I have to admit that doing things right can save time in the long-run.

In fact there is a world of difference between earlier versions of the Hyperaudio Pad and that I largely developed and the version we have now after a week of collaboration.

Putting our weird Renée and Renato relationship aside for a bit, let’s talk a little about what it is that the Hyperaudio Pad is actually meant to do.

The Power of the Remix

OK, so key aims here are really to encourage the remixing of media in a new and refreshingly easy way – in doing so promote media literacy. On the subject of remixing at its importance in counter-culture I’d encourage you to check out RIP! : A Remix Manifesto and the excellent Everything is a Remix series.

Actually Brett Gaylor of Remix Manifesto fame is doing similar work by leading the effort building sequencing into Popcorn Maker. We’re approaching the same objective from completely different angles.

Although we use the Popcorn.js library at the heart of the Hyperaudio Pad, we use text to describe and represent media content. This may seem counter intuitive but actually transcripts can be a great way to navigate media content. Transcripts help break media out of it’s black box and we can scan and search through it relatively quickly. The Hyperaudio Pad borrows from the word-processor paradigm allowing people to copy blocks of transcript and associated media and further, allowing them to describe transitions, effects, adjust volumes using natural language that resemble editing directions in a script.

Media Literacy

As an example of some transitions and effects you can currently insert between clips:

Transcript snippets, combined with text described transitions and effects act as a kind of source code for the media.

Weak References

As cuts are nothing more than weak references to media files, start and end times and effects that are simply layered — in real-time — over the top, we can make additive remixes as easily as we can subtractive ones, that is to say, somebody could take a remix and enhance it by peeling back a specific snippet to reveal more, thereby adding to the piece.

In fact a key goal here is to allow remixing of remixes.

So a bit about more about the technology powering this. Effects are handled by Brian Chirls‘ seriously cool Seriously.js – a comprehensive video compositing library. The great thing about Seriously.js is that it is a modular system. We can add effects (and load their code dynamically) as we go. I also look forward to using some of our experience gained with the Web Audio API to apply realtime audio effects. The wider vision is to make a system that will take raw media, transcribe it and allow you to do all the things you could do with a professional video editing package.

We’ve got plenty more to do, certainly some bugs to iron out, but we’re steadily approaching beta. A lot has been restructured behind the scenes, giving us a more stable platform to build on and more quickly enhance. The question I posed and answered on Twitter. What do you call something between alpha and beta? Alpha.

Going Forward

We still plan to add the following features in the near future :

- alter brightness, contrast, saturation etc
- control and fade volume
- add audio track that can be played simultaneously or sequentially
- add effects/transitions in paragraph etc
- allow editing right down to the word, delete words
- allow fine tuning of start and end points
- allow the playback and sharing of fullscreen video remixes with an option to see the underlying ‘source code’.

Back in the spring of 2012 we started working with the BBC on a project called Breaking Out, which was a short story that demonstrated the concept of Perceptive Media.

“Perceptive Media, takes narrative back to something more aligned to a storyteller and an audience around a campfire using internet technologies and sensibility to create something closer to a personal theatre experience in your living room.”

With this article we want to look at the technical aspects in greater detail than our previous blog post on the subject.

This Time It’s Personal

Making the content unique for each viewer was no easy task. First we needed to get some information about the viewer and then integrate it into the story.

We had these sources of information available to us:

geo-location - with authorization

social networks - if they were logged in

today’s date

From the geo-location information, we were able to determine the local:

city

attractions

restaurant

bar

weather

The social network information checked to see if the user was logged into the following:

facebook

twitter

gmail

digg

We also gathered information for a recent:

news report podcast

comedy movie title

horror movie title

Where appropriate, the information was inserted into the storyline.

The storyline was written by Sarah Glenister and she explained that the process was difficult because usually a story ark depends on the content of the story. So if it is raining, the story may remark on the damp conditions or wet hair, but with Perceptive Media the story arks had to be rather benign to accommodate a variety of situations. This was in part due to the story remaining linear, rather than becoming a tree structure as in an adventure game.

There’s not much Choice of Technology

Since this project had a modest budget, the dynamic voice generation was limited to one character, the Lift. This allowed us to use the speak.js library, where the artificial voice worked well with the non-human Lift character. The static lines spoken by the Lift were generated and stored in audio files so that they matched with the dynamic parts.

The production required that the audio had affect applied to them so that, for example, they sounded like there were happening in a hallway or in a lift environment. Some sounds would require timing alignment relative to other sounds, such as music getting louder when a door opens. The timing of the whole story would also be adjustable to complicate matters, along with being able to control the volume of the various tracks through a control panel.

The Audio Data API is a low level API that gives access to the raw audio data as it plays and as such it allows you to do pretty much anything you want, but does not give you any of the tools to do it with. You are also limited by how much your CPU can crunch in the time between sample blocks.

The audiolib.js by Jussi Kalliokoski appeared to be a contender since it enabled a cross-browser solution, Firefox and Chrome, but it fell foul to the size of the data processing requirement and that WAV files would be required as input. Initial investigations into the convolution effect indicated that it would take approximately 20 minutes of processing to apply, which is unreasonable. There were other issues, such as changing the gains would require reprocessing the whole track and changing the timings would require similar reprocessing of the faders.

The Web Audio API was the most straightforward solution. It had a large number of tools built into it, and the input could take mp3 data or be connected to an HTML5 Audio Element. The node based system allowed complicated audio networks to be implemented in a relatively simple manner. And finally, the API was real-time, in that the audio sent to a node had the effects applied to it on the fly.

Let’s give the Audio some Style

First we needed to develop a system to define the styles for each of the audio characteristics. With the following four style types we were able to define every situation required for the story:

track / environment

faders

filters

convolution effects

The track/environment styles were used to define the specific track styles to a given environment. Each track was not normalized with the others, so we needed to be able to control the gain of each track independently. We also needed to define the audio properties to apply, for example, when Harriet is speaking in the lift environment.

The Audio’s got Assets

Breaking Out has around 200 individual audio assets that are put together on the client to complete the production. Each asset needed to define its audio file, its text, its relationship to the rest of the assets and so on.

The Asset class is responsible for setting up each audio asset ready to be played. This involves preparing the asset for use with either the Web audio API or the jPlayer fallback, loading in the audio data when prompted and populating the asset information, such as the duration. The asset can then be played when requested, bearing in mind that each asset expects to be played in the future. For example, an asset plays at time 120 seconds in the timeline, so if you tell it to play(0) it will play in 2 minutes, and telling it to play(120) will make it play now. Additionally, using the granular noteGrainOn Web Audio API command, the asset can start part way through.

Arranging it all into a Timeline

The majority of assets were collected together in timelines, where the code would position them relative to each other on the absolute timeline. Branches were used to add assets that occurred in parallel to other assets, where the branch asset timings were linked to the assets in the main timeline. For example, a piece of music playing between two points may start relative to one asset and stop relative to another asset.

The master timeline assets are defined in an array starting at line 143 in breaking.out.js

Branches were associated with master timeline assets by giving the master timeline asset an ID. The branch then used this ID as a reference point for its timings.

The branch assets are defined in a (much smaller) array starting at line 1542 in breaking.out.js

Relative Timings Please. Absolutely!

The Web Audio API has many strengths and in our case, a weakness. For example, if we were making a game, then we can easily trigger a sound to play when you click a button to launch a rocket. In our case, we wanted to line up hundreds of audio pieces to play at the right time. This means that when you start the audio, everything must be in place and ready to queue up on the Web Audio API.

This explains the load bar at the start. All the assets must load in so that they are ready to be played and their duration is known, and in turn their relative relationships calculated into an absolute time. This absolute time is then used to queue the asset to play in the future.

The control panel complicated this part, due to 2 of its functions. The user may start playback from any point in the timeline. The user may change the time gap between each relative asset, which causes the absolute timings to be recalculated.

Lost in the Audio Map

The audio node map for each asset was not as complicated as you might think.

Every asset has two gain nodes connected to its audio source. One gain node is connected to the output and the other gain node is connected to the convolver. If the asset used a filter, this it was created inside the asset and then two more gain nodes used to pass the filtered audio source to the output and to the convolver.

The output path had two depth layers, the foreground and the background. Each asset was connected to one of these output paths, which allowed a depth gain to be applied to the audio.

The convolution effects were also created as assets, but were treated differently from the rest of the assets. These effect assets created 2 gain nodes internally to accommodate the foreground and background layers.

Asset Audio Map

A few points to note here.

There were only two convolution assets used for the whole project, one each for the lift and lobby environments. The BBC provided the Impulse Responses required for the convolver node.

During development, we found that attempting to use more than 14 convolver nodes started to use all the CPU. The problem was found due to putting the convolver into the asset to begin with, rather than having one per environment and route the assets audio to them.

The output depth layer gain nodes have hundreds of assets connected to them.

The filters inside the assets are only created if required, and excessive use would require a different audio map solution to be implemented, where assets could share filters. As only 2 assets required filters, we implemented them inside the assets.

Conclusions

The Web Audio API satisfied the goals of the project very well, allowing the entire production to be assembled in the client browser. This enabled control over the track timing, volume and environment acoustics by the client. From an editing point of view, this allowed the default values to be chosen easily by the editor and have them apply seamlessly to the entire production, similar to when working in the studio.

One of the most complicated parts of the the project was arranging the asset timelines into their absolute timings. We wanted the input system to be relative since that is a natural way to do things, “Play B after A”, rather than, “Play A at 15.2 seconds and B at 21.4 seconds.” However, once the numbers were crunched, the noteOn method would easy queue up the sounds in the future.

The main deficiency we found with the Web Audio API was that there were no events that we could use to know when, for example, a sound started playing. We believe this is in part due to it being known when that event would occur, since we did tell it to noteOn in 180 seconds time, but it would be nice to have an event occur when it started and maybe when its buffer emptied too. Since we wanted some artwork to display relative to the storyline, we had to use timeouts to generate these events. They did seem to work fine for the most part, but having hundreds of timeouts waiting to happen is generally not a good thing.

And finally, the geo-location information was somewhat limited. We had to make it specific to the UK simply because online services were either expensive or heavily biased towards sponsored companies. For example, ask for the local attractions and get back a bunch of fast food restaurants. But in practice though, you’d need to pay for a service such as this and this project did not have the budget.

Sometimes when we take on projects we don’t really know what we’re letting ourselves in for. To fully know, we’d spec things out to the ninth degree and who wants to or has time for that? Well people like NASA do, but we don’t.

When we were given the opportunity to work on something called Perceptive Media all I saw was a colourful but amorphous form ahead and although it was explained to me (as well it could be) I still didn’t have a real clue of what it would end up being and crucially what was involved in making it a reality.

But hey it’s the BBC right? You don’t often get to work with the BBC – you know that BBC you spent so much time watching and listening to when you were growing up. The same BBC who make Doctor Who ? – That BBC!

Time-lords aside (or not actually as the case may be), being a bit of an audio-geek I’ve always been fascinated by the BBC’s audio output especially their Radiophonic Workshop especially Delia Derbyshire’s work and this was an audio gig coming from the R&D department of the same organisation. How could we turn it down?

So I jumped at the chance without really knowing what lay in store, which incidentally is very unfair on my colleague who ended up doing the lion’s share of the work.

The brief went something like this : We want to create a web-audio based demo that will adjust its content to the listener, based on information we can ascertain about them. Oh and we want it to sound as natural as possible so that the listener may not suspect that the content is being tailored. I soon found out that this involved generating audio on-the-fly and applying convolution reverb and other effects so that audio sounded natural in various environments. What this all meant is that we needed to use an advanced audio API. Now I’ve had fun investigating advanced audio APIs before, they differ from the standard audio APIs by being designed to allow you not only to play audio files, but to generate and alter audio.

This is exciting because we are finally getting around to doing stuff with audio that we have been doing with text for years. It also crosses over with something I’m looking at called Hyperaudio.

Experimentation

So with this cross-over in mind, I felt I could manage to find the time to experiment and create some proof of concept demos and this lead me to try out the following libraries:

The Speak.js demo pretty much demonstrates what it can do. You put text in and speech comes out. The library itself is ported from eSpeak using something called Emscripten and actually allows you to generate audio pretty much in real-time by constructing a data URI in WAV format.

We wanted it to work well on all browsers that support some advanced audio API. Firefox’s Audio Data API adopts a more low-level approach but in theory at least, you should be able to do anything that Webkit’s Web Audio API does, but with raw JavaScript. AudioLib.js provides the libraries to do this and also abstracts the differences between the two APIs so that you can write one set of code for both.

Personalisation

The data we use to personalise the broadcast comes from a number of sources. The main differentiator is the listener’s location which we use via the geo-location API to determine, local weather, radio streams and landmarks which are then subtly inserted into the audio stream. The only restriction is that you must be in the UK to really encounter the differences – this is partly due to the fact that we use some BBC resources that are only available for the UK but also so that we can keep the data manageable for what, after all, is just a demo.

The second source of information about the listener came from a slightly more sinister place. For fun I’d been working with a good friend of mine Matteo Spinelli on a project called Underpants and while looking at the issue of browser traceability across websites we figured out how to determine which social networks a browser is logged into. Cross-over struck once more and we used this technique to personalise the part where our outdoor-challenged hero is urged to log out of her favourite social network and leave the apartment.

Advanced Web Audio

So what changes between the version of the broadcast that uses the Web Audio API and the fallback version that doesn’t? Well there are a number of factors, some of them quite subtle. Speak.js outputs a robot style voice which is fine for our use as the electronic voice of the lift. But we wanted to make sure it would fit in properly to the various environments in which it was set. To do this we created something called a convolution-reverb. In short, a this reverb allows us to apply the right sort of audio ambiance to a sound. So if a sound is coming from a lift we apply a lift type echo. We also apply the same ‘echo’ to the streamed radio broadcast that is played at a certain point in the broadcast.

The fact that we are using an advanced audio API also enables us to add various other effects to other pieces of audio. However we soon found that we needed to be sensible with our audio design, since convolutions with the Web Audio API does take a up a fair amount of CPU. During a development error, a unique convolution was used for each sound, this was found to start failing at around the 14th.

We also made use of audio filters, for example the radio podcast uses a high-pass filter applied to it which makes it sound ‘tinny’, another example is when Harriet opens the her apartment door at the start we apply both filters and faders. The Web Audio API uses a node based approach which means that you can feed the results of one effect into the next so we can apply filters, faders and convolutions to any audio source. To achieve all of this we made heavy use of Web Audio API’s AudioParam which allows nearly any attribute to be changed using handy linear transform effects – we used this to fade in and out, or cross-fade between filtered and unfiltered outputs.

So the Web Audio API version applies filters, faders and convolutions to the audio whereas the standard HTML5 audio versions do not. That’s not to say that given enough time we couldn’t have achieved the same effect using the Audio Data API included in Firefox. But since the new audio standard is slated to be largely based on the Web Audio API it was decided for the purposes of this demo to concentrate our energy in this area.

Once we’d got the core of the functionality working, we set to work on creating a control panel to allow us to tweak every single one of the volumes, filters, faders and convolutions.

We wanted to be able to demonstrate to editors of audio how we could tweak pace, reverb and sound-effects in real-time and although requiring a complete code re-factor we hope that the fact that the whole thing is pretty-much customisable makes this a powerful demo and will be useful to others as well as ourselves who are dabbling in this area.

Conclusions

I think this was definitely an interesting and worthwhile experiment. However as its aim was to be subtle, it purposely does not make immediately clear the potential of the technology. Technically what we are able to achieve turned out to be a kind of audio framework to allow the ability to create and tweak audio as it’s being played. This is useful to producers of audio to see how effects and timings can alter the experience and is especially useful for applications such as games where perhaps you want to give your sound-effects context. I also feel that these techniques could be used for applications such as dynamic story-telling. My daughter — all too often — asks me to tell her a story featuring robots, dinosaurs and goblins and all too often I fall back on the same old principles and formulas of children’s story-telling, the rule of three and so on. Post happy-ever-after she often wants me to add to the story, getting me to fill-in or clarify some of the details. It’s not a huge stretch to imagine that we could create dynamic storytelling applications for kids. A pinch of AI here, some personalisation there and a heavy dose of randomness might just be enough to keep them happy for a bit.

So there we have it, one small step closer to the old Star Trek computer (Did Doctor Who’s have a voice interface? I forget). We’ve already seen the application of voice input with software such as Siri. It shouldn’t be too much longer until audio interfaces start to become common-place, with so much current emphasis on the visual I think this could be quite refreshing.

Epilogue

Question. What’s harder to debug than an intermittent bug? Answer. An intermittent bug that only manifests itself when you deploy to the server. Crazy I know and totally unexpected to us and for this reason you may see issues when running on Firefox (but not Opera). Being supporters of both Mozilla and Firefox we were much dismayed by this bug and spent a significant amount of time trying to get to the bottom of it. Unfortunately due to its nature we were only able to put in a loose bug-report If anybody wants to help us solve this issue please feel free to take a look, even if it’s just to download the application from GitHub and verify that it works locally for you.

Thanks then to Ian Forrester and Tony Churnside for the opportunity to work with them and their team at BBC R&D, also of course Sarah Glenister for the excellent script and Angie Chan for the great artwork. Jussi Kalliokoski for helping us work with AudioLib.js But most of all I want to thank my colleague Mark Panaghiston for working tirelessly behind the scenes not only on the significantly challenging audio aspects of the project but also go above and beyond in integrating the visual aspects and even sourcing and setting up the hosting.

A few weeks ago I was asked to write a bio – explain a bit about myself in a few lines. I hate writing bios, all that faux third-personage – I never really know what to write.

The reason a bio was required was that I’d been offered a position as a Knight-Mozilla sponsored fellow with Al Jazeera – the happy end-result of a Knight-Mozilla news challenge that I have been actively enjoying over the last few months. It was agreed that I would work slightly less than full-time hours and on a largely remote basis and so I was glad to accept the fellowship knowing I could still dedicate time to my family and other interesting projects (not that I would classify my family as a project you understand). Working with Al Jazeera is of course an experience not to be missed. I couldn’t have imagined that my life would take such a turn a year ago or so, but I’m very glad that it did and I put it all down to my (broadly directionless) approach to working life.

I consider myself very lucky that because of my sedentary lifestyle I need very little money to get by. This has allowed me to follow my interests while working for our small ‘company’ Happyworm for the last 10 years. I consider this a very real success. We created a popular open source project and foster a fairly large community — from this many opportunities have arisen. Recently for example, I was asked by the W3C to run an online audio/video course and this has been a fantastic experience and now I have the opportunity to find out the world of journalism works and hopefully contribute. All this would probably not have occurred if I hadn’t simply thrown caution to the wind and followed by interests.

There are two of us now working for Happyworm, we used to be three but our web designer (also my partner) decided that at least one of us should have a steady income and generously offered to take up full-time work at a local council. So it’s just the two of us working with 3rd party designers when we need them. I’m based just outside Florence and Mark P’s in the heart of Edinburgh, although we are very different, we have similar requirements and a similar history – Mark P left a well paid job as a CMOS camera chip designer to come and work on web stuff.

This finally brings me to the title of this post, when writing my bio I foolishly used a little known, possibly non existent word, by describing Happyworm as tiny altrepreneurial web agency.

“The Altrepreneur, like their colleague the Entrepreneur, runs one of the 3 Million Micro Businesses in operation in the UK today. However unlike the Entrepreneur, with a financial and career focus, the Altrepreneur is doing it for entirely different reasons.

It seems that 70% of those small businesses are being run because the owner/operator is focused on achieving a change in their life-style through running a small business, they are looking to increase their overall quality of life by putting in some up front hard graft.

This goes hand in hand with the growing movement around Authenticity (…). The idea that the source of much tension in our lives is the conflict between our true selves and the roles that we play. Getting in touch with your true self and letting go of that tension will lead to a very different kind of life.”

This article chimed very strongly with me – my objectives and the decisions I made to leave a well paid job, set up Happyworm, move to another country, be my own boss and follow my own interests where possible. We now have two children, who I am fortunate enough to see a lot of. Recently I decided to look after the 9 month old Anna in the mornings for 4 days of the working week and then work from 14:30 until midnight with a 3 hour or so (I don’t time it) break for family dinner and games. I grow vegetables, cook at least once a day and am involved in the local community centre – finally I feel like I am achieving that mythical work-life balance.

So to me the word altrepreneurial seemed a perfect concise way of explaining what I did and how I see Happyworm. Incidentally Happyworm turned 10 years old in October. You might think 10 years is pretty good going for a small company but the truth is we would never have lasted so long if we were in it for the money, we’ve had good spells but also our fair share of dry spells where we worked on open source, brewed our own beer and patched our own clothes (or at least I did). Turns out following our interests and making jPlayer has been much more of a success than we could have imagined, I think we’re approaching half a million downloads and perhaps the best measure – a community of around two and a half thousand.

Despite the money, like most people I’m not really happy working long hours on projects I’m not interested in (although through necessity I’ve done my fair share), I don’t think actually I’m any good at something my heart isn’t in. Maybe I’ve been spoiled, but the most important thing for me is to enjoy my work and so life, the money is always a secondary consideration and that’s why, contrary to what you may see written in my bio, I work for a small altreprenurial web agency – not entrepreneurial. Damn you auto-correct!

This notebook has been lying on my desk for the best part of a decade.

The web audio community are a vibrant bunch. No sooner had the standard <audio> API been established, than developers were clamouring for more. Just playing audio wasn’t enough, we wanted to analyse, react to and manipulate our audio. Happily, the browser makers obliged with first Mozilla, then Google producing enhanced web audio APIs for their browsers – the only problem was, they were two very different implementations. The Audio Data API implemented in Firefox exposed the data at a fairly low level, while Webkit’s Web Audio API provided a higher level abstraction providing a number of predefined functions. Luckily, it didn’t take long for the JavaScript community to react and start bridging the gap between the two, by writing libraries that provided a common API, libraries such as sink.js which smooths over low level differences. In turn, sink.js was used by ‘higher level’ libraries like audiolib.js – (a general purpose audio toolkit) and Audiolet (which provides a more musically intuitive API, with similar objectives to Webkit’s in-browser solution). There are many others, such as XAudioJS which sports a Flash® and base64 data url wav generation fallback, older projects like dynamic.js that just provides a Flash® fallback for the Audio Data API and DSP.js a Digital Signal Processing Library.

People really love messing about with audio.

Notice that the process of creating all this cool functionality didn’t come about from a W3C spec. Similarly, the Advanced Audio APIs were not the result of a W3C think-tank, but from two competing visions of what an advanced audio API should look like. Now it looks like the Web Audio API will be implemented in Safari as well as Chrome.

Once you create compelling functionality, developers will immediately start to use it. It may be experimental but developers will start to rely on it to make cool stuff. Cutting edge technology is seductive like that. I’m surer than sure that the Web Audio API has been well researched and has taken much inspiration from tried and tested APIs that exist outwith of our lovely browser based world (Apple’s Core Audio Frameworks, I believe), but I’m not convinced that you can really tell what web developers need or want until you give them something to play with.

Mozilla’s approach was to expose a very comprehensive low level API, which potentially allows JavaScript developers to create all the functionality of Webkit’s Web Audio API and then some. As a result we get libraries like JSMad cropping up. What does JSMad do? Significantly, it allows you to play MP3s in Firefox*. Is JavaScript fast enough? Apparently so. This was a ‘this changes everything’ moment for me and since then a similar approach has been taken by pdf.js and more recently Broadway.js which decodes H.264 on the fly.

*Neither Firefox or Opera support MP3 natively due to patent concerns.

I’m not saying Mozilla’s Audio Data API is perfect, there are issues with audio using the same thread as the UI and synch issues with multiple streams. However this is being addressed in the MediaStreams Processing proposal and it’s worth taking a look at it, even if it’s just for an insight into what future implementations could look like.

I’m digressing. The point is, if browser makers expose the low level API, developers will quickly come in and start writing libraries on top of that API. As is often the case, the developer community will start making things that the browser makers had never even considered. It makes sense, there are many more web developers than browser developers. Sure, web developers will bridge the gaps and polyfill over the cracks, which let’s face it, has been the only reasonable way of going forward with HTML5, but crucially they will also make new libraries that other developers can use – and all of this at very high rates of turnaround. Of course, the common-or-garden JavaScript developer has a series of enormous advantages over the browser API developer or the standards bodies that seek to define these APIs. I’m gonna name three here:

Strong community — Web developers have a huge active and open community to draw from.

Lower barrier to entry — The barrier of participation once something is put on something like github is virtually zero.

Room to manouevre — Nothing web developers write is ever set in stone, JavaScript represents a much more fluid abstraction than the less flexible native browser code.

Ok, so bear with me here, and this is more of a question than a proposal – What if we separate concerns between browser makers and web developers when it comes to creating standards? Browser makers could concentrate on security, privacy, performance and exposing low level API’s in such a way that web developers can start to build libraries and APIs in the fluid, dynamic, iterative and extremely reactive manner that the web as a media allows. Once these libraries reach an acceptable level of adoption, browser makers can get together and decide which of these features they want to adopt based on tried and tested use cases, and yes make it a standard and build it into the browser. Wouldn’t we move forward more quickly that way? And as a bonus, no browser would be left behind as we’d be building the polyfills along the way.

In short, what I’m saying is that if the standard bodies put their energy into defining low level APIs, the high level APIs will look after themselves, or rather the community will look after them. After all it seems that the W3C themselves want a more community based approach to standards and besides we all know that bottom-up trumps top-down, right?

Outside my flat is an open space that the local council didn’t quite know what to do with, I’m sure they considered adding basket-ball hoops, concrete tables, a kid’s playground and all kinds of things. As it turned out they created a decent flat surface and pretty much left it as that. The users of this space, mostly children, decided this was a perfect space for playing soccer and improvised the space to include a hand drawn goal and pitch markings. If the council really wanted to make something permanent, they could take inspiration from this and create real goals and solid pitch markings.

It’s probably too late to change the Webkit implementation of the Web Audio API significantly, but I would strongly urge the developers of it to include a more comprehensive low level API in future releases. What’s the worst that could happen?