Alexa and voice assistant technology dominate CES

If the buzz coming out of CES this week is any indication, Amazon’s Alexa and other artificial intelligence (AI) voice assistants like Siri and Google Assistant are prime for a big year in 2017. A large part of Alexa’s success has been Amazon’s strategy of opening up its software for other companies to build Alexa into their own products, as well as providing a developer kit for creating new Alexa Skills to extend its functionality.

According to Amazon, third party developers have already created over 7,000 custom skills to enhance Alexa, with new functionalities ranging from operating your home’s smart devices to ordering doll houses online with nothing more than a few simple voice commands. Just like Smartphone apps, some skills will be gimmicky in nature, while others can change the way we live our day-to-day lives. For the latter, some new solutions will rely on continuously updated content to provide ongoing value and utility to users, and where there is content, there will be a need to manage it through a Content Management System (CMS).

Types of Voice Content

Prior to examining the features and functionality that a CMS would need to power these next generation of voice apps, we first need to understand the different types of content. In general, voice content falls into two varieties – channel agnostic and voice specific.

Channel Agnostic

Much of the content we consume today is the same whether it is read from a desktop browser or delivered via a smartphone app. The layout or presentation may vary according to the rendering devices, but the underlying content is stored in the same way. News articles, sports scores and store locations are examples of content that could be delivered through Alexa without having to specifically modify it for the voice channel.

Voice Specific

Some content needs to be created specifically (or modified) in order to have proper meaning when read aloud or using audio. Alexa’s skills kit supports Speech Synthesis Markup Language (SSML) which allows for fine control over how a particular word or passage of text is delivered. For example, you may want a longer pause within the speech, or want a 10-digit number read back as a telephone number instead. In some instances, simple text to speech may be insufficient. With Alexa, entire audio clips can be embedded into a response, giving skills developers the ability to play anything from sound effects to actual voice recordings.

CMS Requirements

Content Management Systems come in many different flavors, and even implementations using the same platform have a variety of options. Some naturally work better than others for managing voice content, but for the most part, the majority of modern CMSes can be adapted to this application. In order to meet the system architecture and content needs of a voice assistant application, the following features and functionalities should be considered:

Content APIs

The code for Alexa skills are hosted and run in the cloud. While it would be possible to update the Alexa skills’ code directly each time new content was available, a more sustainable approach would be for the skill to connect to an external CMS to fetch the latest content. In order for this to happen, a CMS would need to expose its content through an API in a format that can easily be parsed and interpreted by the skill. Over the past year, a handful of platforms known as headless CMSes (for example Contentful, Built.io and Prismic) have risen in popularity due to their API-first approach to content management. These platforms provide out-of-the-box functionality to access all content within the platform via RESTful APIs – perfect for the paradigm being discussed. For businesses using an established enterprise CMS, existing content can be exposed to the voice apps through the development of content APIs. Developer-friendly platforms like Adobe Experience Manager (AEM) make it easy to add this type of functionality through the use of services and components. (For those interested in the details - a fellow colleague presented a talk on this very topic at ICF Olson’s CIRCUIT conference a couple of years ago, check out the slides here.)

Content Centric Storage

CMS systems looking to address multi-channel use-cases (for example web + voice) should store its channel agnostic content separate from the presentation. To accomplish this, data should be organized and structured in a content model that can be consumed and interpreted by each respective channel’s rendering agent - in this case Alexa’s using its text to speech capabilities. For existing page-centric CMS implementations, this may mean refactoring content currently embedded within page templates into content items or nodes.

Digital Asset Management and Encoding Workflows

For voice specific content like audio clips, a CMS should leverage its Digital Asset Management (DAM) functionality to manage and organize the audio assets. Custom encoding pipelines and workflows can be implemented in order to ensure uploaded files are automatically converted to the appropriate format. For example, Alexa specifically needs .MP3 files that use the MPEG version 2 codec, with a bit rate of 48kbps.

Assess Your Needs

So is your Content Management System ready for Alexa? As with any technology assessment, start with the needs of the user and of the business. What customer experience is trying to be achieved with Alexa, and how does it integrate with all existing customer touchpoints? What type of content will be delivered through voice? What type of CMS technology is currently in place to manage content, and how easily can it be adapted to meet the needs of voice? Do new CMS solutions need to be brought into the eco-system to create a robust and maintainable solution to market? Thinking through these questions will help in preparing for Alexa and the next wave of voice assistant apps.