Android Chrome does not allow applications to play HTML5 audio without an explicit action by the user

Issue description

Example URL:
http://iangilman.com/rdio/chrome-audio/
Steps to reproduce the problem:
Android Chrome restricts playback of HTML5 audio; you need to call .play() from within a click handler or such. This is an explicit feature, and has been discussed in http://code.google.com/p/chromium/issues/detail?id=138132 . The comment at the end of that bug says to file a new bug if adding new information, so that's what I'm doing here.
I work for http://rdio.com, and we'd like to provide mobile playback with our new JavaScript API. Due to our relationship with the music labels, we can't just pass out .MP3 URLs, but instead must handle playback ourselves. For this reason (and others) our JS API creates a non-visible rdio.com iframe on the API client's site, and we do our playback (among other things) within that iframe.
This structure works great on desktop, but with Android Chrome it's simply not possible for us to trigger playback from within the click handler. Here's the sequence:
1. User clicks
2. We postMessage into the iframe
3. Inside the iframe we do an ajax call to get the MP3 url
4. Inside the iframe we play the MP3
I've mocked this up in a simplified version for testing here:
http://iangilman.com/rdio/chrome-audio/
I realize our scenario may be a bit unusual, but I think it highlights one of the ways in which this playback restriction is a hindrance to users of the Android Chrome browser.
What is the expected behavior?
There should be some way to play audio in this scenario. I'm in favor of removing the restriction entirely, but even some way to keep track that the postMessage/Ajax chain started with a click would be better than nothing.
What went wrong?
See above.
Did this work before? No
Is it a problem with Flash or HTML5? HTML5
Does this work in other browsers? Yes Opera Mobile
Chrome version: 18.0.1025469 Channel: stable
OS Version: 4.2

Chrome for Android: You can disable the 'gesture required' flag by going to 'about:flags' in your chrome. Then find the flag 'disable gesture requirement for media playback' and click 'enable', at the bottom of the page box will appear that says 'Your changes will take effect the next time you relaunch Google Chrome' and a button 'Relaunch Now'. Click the button. You can now start videos via javascript.
For some reason this seems to be the best kept secret on the Internet. I searched for months before finding this. Good luck.

I understand, but it is a start. I find this issue particularly annoying on tablets that only have wifi connectivity. I kinda understand the rationale on phones where one is (unjustifiably) charged for connect time but not on tablets that are wifi connected. It seems to me that Chrome could check to see if the connection is via wifi and then allow autoplay.

@bartonph: Yes, I can do that for my own browser, but if I want to write a game in html5 for example, I cannot turn that on for my players. That means that I can't play sound effects on mobile without the user touching the screen.

Seems like if someone wanted to make a patch to remove this restriction entirely, the "gesture required" flag code might be a good starting point. I think making a patch would be a good way to move this conversation to the next level.

One solution for this and many other situations like this is making an api that can request the user to change a flag in a simple way, something like the Remember password bar.
"xyz.com needs X feature Approve* Deny (*May cause Y problems)"

From a pseudo-geek that builds solutions for people who are deaf or blind....
I appreciate all the effort you guys are making to fix this annoying problem. What Apple, Google, et al have failed to realize, is making a requirement to click a link that many people can NOT see. I also have a need to autoplay an mp3 in various venues, for reasons that don't only include entertainment. The work-arounds (VoiceOver and TalkBack) are inadequate, at best, and create other problems.
For what it's worth, my thought is to create a switch under Accessibility options to enable autoplay.

As mentioned previously the Web Audio API can already play without user gesture so this does not add any security/privacy issues that are not already possible with that. It serves only to hinder legitimate use cases e.g. playing music in games.

It feels as though this type of issue is well known territory. When a site (Google.com or a lesser known one) wants to access location, the webcam, and so on, a one-time permission notification is shown. The user grants or denies the permission, and the resulting functionality is as close as possible to what the user wants.
I don't feel that a Google.com exception is appropriate. Is there a precedent for Google-specific properties being exempt from feature crippling in Chrome, or is this the first instance of it? It's Google's browser, but this feels as though it is not keeping with Google or Chrome's principles as they have been stated in the past.

> Does google.com depend on the restriction being in place, so that it breaks with no restrictions? That sounds pretty odd...
We're going to run an A/B experiment with autoplay enabled and disabled. In the "autoplay disabled" branch of the experiment, we need that code in order to not break google.com, which relies upon autoplay working.

I see, there was already an exemption for google.com, so it wasn't depending on the default behavior but rather the exemption.
BTW, it should be possible to fix google.com to work around the restriction by priming audio elements on any user input, which presumably has to occur anyway before there's any sound to play.

I am all for a permission dialog and have previously brought it up with the W3C for inclusion in the spec (https://www.w3.org/Bugs/Public/show_bug.cgi?id=25554). I realize games need to auto-play music, but there are also plenty of people with low data caps who can't afford media elements pre-loading data in the background. A one-time permission request seems like the most reasonable solution for everyone.
The "Images can already consume data in the background, so why bother?" argument (#5) doesn't seem valid to me, because one doesn't usually expect many, many high-definition images to be loaded during casual web browsing. High definition video, however, is now commonplace on the web.
I would hope that the permission dialog feature would also be applied to the desktop version of Chrome (even if its behind a runtime flag) for those of us who are forced to use heavily-capped satellite or cellular home internet services.

Audio files are tiny (~1MB/min) compared to high definition video, so I don't find the low data cap argument convincing for audio. Having said that, I agree that a one-time permission per website, remembered by the browser (much like flash deals with device permissions), would be reasonable.

> The "Images can already consume data in the background, so why bother?" argument (#5) doesn't seem valid to me, because one doesn't usually expect many, many high-definition images to be loaded during casual web browsing.
HTML5 games typically do this. They may involve a large download which is 90% images and 10% sound effects. The images can auto-download and display without interaction, but the sounds can't play without interaction.
If there is a permission dialog then will it ask for permission to download those images, even though they took 9x as much bandwidth? What about background AJAX requests to download large resources like meshes for WebGL games? Why has media been singled out here?

> I don't find the low data cap argument convincing for audio.
You're right. I was mostly referring to video.
> What about background AJAX requests to download large resources like meshes for WebGL games? Why has media been singled out here?
Obviously when one intends to play an HTML5 game, they can expect large amounts of data to be downloaded. I'm talking about casual web browsing on websites that unexpectedly pre-load HTML5 media for video/audio players.

The point that was mentioned in #20 is the main thing: There is a work-around using the Web Audio API that already allows/forces us (developers/websites) to download the whole file (all files) and play them whenever we want, without needing user interaction. This basically makes the limitation on <audio> elements not being able to call play() without user interaction simply an inconvenience for developers and provides no real added benefit/safety for users. And like was pointed out, may actually be worse, since the whole file needs to be downloaded first using the Web Audio API. And this method would likely lead to a developer downloading all files in advance to make it easier to work with.

Just heard today from another client that needs auto play of mp3. Her
dyslexia requires it for certain non-gaming applications. Users who are
blind cannot be expected to find and activate an audio file. The one-time
permission should be an Accessibility option, not presented when loading
the first web page offering the audio. That strategy still insults the
blind.
Think inclusion.

I'm porting a game like app to the web with emscripten. It relies on using videos as texture animations. The game logic decides when to start a texture animation. Works great with desktop chrome but I don't seen any way to make it work in chrome on android.
If it's not possible to to allow all websites to play videos without user interaction may be it could be allowed for webapps added to the start screen. They can already request special treatment like being run fullscreen because there is already user interaction involved in "installing" them.

This is an incredibly strange limitation to impose, given the Media Source Extensions spec and the Chromium developers' eagerness to implement it. With this API I can download an entire video via XMLHttpRequest and have it ready, but am unable to play it until the user performs a gesture on their device.
Furthermore, this limitation hasn't really stopped those most determined from autoplaying videos. I work for an online ad company and have recently been tasked with animating videos in a canvas element instead of a video element, purely because one of our competitors has done this and our customers are demanding it. Rather than preventing us from autoplaying videos, the Chromium and Safari developers have instead forced us to choose a less efficient video encoding method and actually increased the amount of data downloaded by users, as well as increased power usage from having JavaScript decode video frames instead of the native video decoders.

I'm completely astounded - I fought for weeks debugging this - ripping and refactoring my app just to find the problem - only to find out that it's an intentional violation of the spec as it's written.
Despite the fact that it's bad practice and very Microsoft-y of you to by-pass agreed upon specs - why in the world won't you at least give developers some warning in the console or something???

I doubt A/B testing will produce meaningful conclusions. You'd be controlling the very variable that you should be observing, which is whether or not end-users want autoplay on or off. And imo, even with valid results, the only thing a study like this should inform is what the default is to be for a user-configurable setting.
More generally, I think it's a bad idea to patronize one's users. Something is obviously wrong when the outcome of a policy is exactly the opposite of its supposed intent. Yet this is the predictable result of an approach that presumes people incapable of acting in their own (and their own customers') best interests.

Yes, WebAudio can work for small clips of a few seconds. For large music clips (1+ minutes), the decoding typically takes too long, especially on ARM phones and tablets.
Or, you can set the --disable-gesture-requirement-for-media-playback Chromium flag.
In Cordova Crosswalk, you can enable this flag by adding "xwalk --disable-gesture-requirement-for-media-playback" inside a "/platforms/android/assets/xwalk-command-line" file.

I agree this is a bug that should be fixed. If the user starts interacting with the gamepad on a given page than it's probably ok to allow that page to start audio. It is far from trivial what would be an appropriate behavior considering both UX and security though.

Recognizing gamepad actions as user gestures is not sufficient for in-game sounds which are not triggered by the user's character.
Per the original report, something that would make this a lot better would be keeping track of event chains having started with a user gesture. That would enable one sound to play after another, at least.

This is unbelievable. Whoever took the design decision to implement this limitation was a RETARDED.
First of all, there's no reason this should be applied when connected via wifi, it only makes sense when connected via mobile data.
Secondly, the "only if started by user action" paradigm is STUPID. It doesn't fulfil the purpose of protecting the user from playing unwanted audio in the first place, and the limitations are unacceptable, because it would require a user interaction for EVERY single adio that needs to be played (I have verified: even if the user triggers an audio by a click, any other subsequent audio that needs to play automatically cannot).
How is it possible that you don't see the OBVIOUS and only correct way to do this, if you really want to protect the user from undesired streaming of media??
Just issue a popup prompt the first time a page wants to play some media without user interaction. Ask for user confirmation and give the user the option to check "don't show this message again" (or "allow future media streamings from this page" or whatever). That's the ONLY acceptable way of limiting playing media, if it has to be limited at all.
This is ridiculous, never seen anything this stupid.

It lowest-common-denominator thinking. I guess it was designed to protect against overage charges, or something like that. However, you can use the WebAudio API for small clips, as a workaround.
It's basically as simple as a switch in the browser engine. I wouldn't expect this to change for another few years ... unless HTML5 games become more mainstream, or a major competitor (i.e.: Apple) changed it also.

- Comparing to exactly described key words in RFCs (MUST, MUST NOT...), W3 drafts are a mess: "User agents do not need to support autoplay, and it is suggested that user agents honor user preferences on the matter." vs "The autoplay attribute is a boolean attribute. When present, the user agent (as described in the algorithm described herein) will automatically begin playback of the media resource as soon as it can do so without stopping." Where exactly can developers read about browser and platform specific behaviors (like this bug report)?
-> If web designer first encounter this, why don't you at least put some notice in console log (even IE learned that)?
- Is the restriction in place to avoid surprising users with immediate audio playback?
-> If yes, why isn't it also in Chrome on desktop?
- Is the restriction in place to avoid excessive data usage?
-> Why the restriction apply on 10 inch Android tablet with Wifi but not laptop (or Chromebook) also running Chrome and connected to the same Wifi?
- HTML5 Audio supports loop but it's not gapless so it's not good enough for various cases. Web Audio API fixes at least that.
-> But this restriction doesn't apply to Web Audio API. Why not? It should be consistent.
- Web Audio API isn't general solution because it's not enough to autoplay in Safari on iOS.
-> Because of this, designers have to come with a solution for starting playback from click/tap context. Say I want to design site such as Youtube which plays video after clicking from list of results (it indeed works).
But why are you making it so hard? It's year 2015 and you still force designers to check for user agent and optimize sites for multiple UAs simply because there's no property telling whatever autoplay is supported.
Today it's still possible to create desired behavior (at least for most cases) but it wastes so much time of many talented web designers all around the world.

#99: this functionality is already possible on iOS, albeit with a little doing. On the first touch event, you can call a .play() method and have it take, and _all_ subsequent ,play()'s will work. On Android Chrome, there is a 1-touch-1-play requirement for <audio> . This, in combination with Issue 424174 , make Android Chrome almost unusable for any kind of multimedia experience at the moment.

#103, can you elaborate? What you are saying seems to contradict Issue 350645 .
Are the Android limitations on this documented anywhere other than this issue and the chromium code?
More broadly, it seems as though Chrome has a whole bunch of (afaik) undocumented limitations and quirks. I know it's broader than this specific issue, but I would like to urge the Chrome team to please start documenting this stuff.

in crbug/350645, you are creating a new audio element after the first ended.
Since user gesture is waived for the 1st audio element, but not the 2nd element., so the 2nd element won't automatically play.
Here is an example that the 2nd video will autoplay after the first one finishes by changing src attribute: http://videotestsuite.appspot.com/test?test=swap-ended
Android chrome's autoplay behavior is similar to that of ios safari. So if a workaround works on android, it will also work on ios.

#106, it looks like an issue with your audio file. The ended event is never fired, possibly some format issue with the mp3 file.
I tried http://www.w3schools.com/htmL/horse.mp3, and the ended event is fired.

#103: thank you for the clarification. In order to support multiple overlaid audio effects our engine now creates 100 Audio elements with an "empty" WAV data URIs, calls play() on all of them on first touch, then changes the src's of this pool to blob URL's to cue in-game audio.
As a data point: this is an extremely cumbersome workaround, especially when Web Audio does not carry with it these restrictions, but instead a severe performance penalty.

#108, you can actually call play() on a media element with no src at all to prime it for later use. It's still very frustrating, but at least the browser won't have to decode a bunch of empty audio files.

Hi, I want to play audio right after my game is loaded. Here is a little demo (see the source code) http://www.ivank.net/veci/sound .
It works in Firefox for Android, but not in Chrome for Android.
I can play audio right after page is loaded in Chrome for Android using Web Audio API. But it requires decompressing the whole audio file into raw data, my 20min will take 400 MB of RAM.
Isn't it just stupid to restrict HTML5 Audio, but not Web Audio API?

I/chromium(11922): [INFO:CONSOLE(0)] "Failed to execute 'play' on 'HTMLMediaElement': API can only be initiated by a user gesture."
I'm getting this error from my new HTML5 web app for the blind running on Google Chrome on Android 5.01 (HTC One M8) when trying to play sounds that are automatically generated from live camera frames. Requiring a user gesture is not acceptable here (or at most once at program start, but certainly not every few seconds). Why does this restrictive API invocation policy exist? Enabling the "Disable gesture requirement for media playback" in chrome://flags is a workaround that fixes the problem, but is too much hassle for the average (blind) user of my web app. Moreover, the "save data" argument does not apply at all in my case, because I am playing a URI resource that is being generated on-the-fly in memory, with zero external communication.
BTW, Firefox on Android does not give this problem and works as intended (by me) out-of-the-box.
Thanks

+ojan and +kenji who think about how we handle these kind of interventions generally.
Predictability program has set an OKR to gain traction on the top 50 starred bugs this quarter: either by closing them, stating what milestone the fix will ship, or setting a NextAction date so that we know when to check back in.
@Ojan, Kenji, Mounir - do we have consensus on what our answer is for this bug? If so, are we able to make progress in some direction this quarter?

+some other folks
We're actively thinking about and trying to improve this space. It's particularly difficult to find the best balance given the degree to which users don't like their phones making sound unexpectedly.
We don't yet have consensus on what our answer is. Hopefully we will this quarter. Setting NextAction to 2/20 to at least give an update.

Blockedon:355658696617Cc: iclell...@chromium.org mustaq@chromium.orgComponents: -Blink>Media>Video -Internals>Permissions>Model Blink>InputSummary: Can't delegate the gesture for playing audio on Android to a cross-origin iframe (was: Android Chrome does not allow applications to play HTML5 audio without an explicit action by the user)

The overall restriction here is definitely not going away, sorry - users have made it clear they don't want iframes to play audio by default without the user tapping on that frame. But the specific use case here is reasonable - updating the summary.
We're doing (or have done) a number of things to help:
Web apps installed to the home screen will be able to play audio by default - issue 715049 .
When one frame does a postMessage into another frame, the gesture indicator flows with it - see issue 355658 . So if RDio could wire up the code so that the play request was inside of an onmessage handler in the iframe, and the postMessage call happened as a direct result of a click then it should just work. Does that help?
That's still quite brittle and overly restrictive though. In issue 696617 we're exploring relaxing user gestures significantly - there just had to be a click at SOME POINT in the past in the frame. But in that case we'll still need some way to explicitly propagate the "user activation" state down to the iframe (we wouldn't want all ads to get audio permission by default just because somebody tapped on the parent page). mustaq and/or iclelland are exploring ideas here. Do you guys have a separate bug to link to for that? Who should this bug be assigned to for tracking?
In addition we are looking at ways for some

Your text got cut off there ^^.
Apps are designed to play music and audio without click. Apps written in e.g. Cordova should not have the same restriction as web apps in a browser window. They should have the same range of functionality as native apps.

Owner: mlamouri@chromium.orgStatus: WontFix (was: Available)Summary: Android Chrome does not allow applications to play HTML5 audio without an explicit action by the user (was: Can't delegate the gesture for playing audio on Android to a cross-origin iframe )

Sorry for the cut-off - I think that was stale. The rest of the comment covers everything I'm aware of.
Regarding Cordova, that's a good point. Filed issue 728330 to track that case specifically.
Also filed issue 728334 to track the specific case this bug was originally opened for - explicit passing to an invisible subframe (though I think postMessage fixed many of those cases).
Other than all these improvements discussed here, there's no plan to eliminate the restriction entirely so calling this bug WontFix. We should track specific cases of possible relaxations in specific bugs.