I gave it a go, and was able to get the Alexa equivalent of “Hello World” running in, say, 15 minutes. If you were racing, didn’t care to understand why anything worked, and didn’t want to make any changes to the sample application, 5 minutes isn’t really an unreasonable claim. Playing with it for an hour gave me a decent understanding of what Alexa can and probably can’t do. But being able to interact with a device already on my desk which I’ve never directly programmed in just a few minutes is actually pretty cool.

However, reports of the coming AI apocalypse have been greatly overstated.

The Alexa service certainly has cool voice recognition. Ambient room sounds are sorted from human voices. phonemes are identified and rolled up into words. Words are filtered based on context–maybe– and sentences built. Alexa can then identify keywords to figure out to which service to route the request. All cool.

That’s about as far as the smarts go, however. The developer of a “skill” gets exactly 0 help in interpreting the semantics of English sentences– and yes, Alexa understands only English. The skill developer has to specify the exact words the user might speak to your program. EVERY. SINGLE. WORD. There are no smarts whatsoever. Moreover, your skill isn’t passed any context about how the parsing of phonemes went. If Alexa (without any context) decided the user said “Plato” instead of “Play-Doh”, your skill will either fail to understand, or you’ll have to guess ahead of time what Alexa might have misunderstood.