APL Speech/Text Synchronization for Text Blocks

Note: This public beta release of Alexa Presentation Language (APL) includes the APL documentation,
authoring tool, and APL beta forums. Please note that we may improve or change APL as we receive feedback and iterate on the feature.

Your skill response can associate speech with an APL Text component, and issue a command that highlights lines of text as the speech audio is played, to demonstrate which lines are in focus for a block of text.

To use this feature, you must provide speech data as plain text or as marked-up text using Speech Synthesis Markup Language(SSML) expressions. Before this data can be consumed by an Alexa-enabled device, it must be transformed into speech. To enable this transformation, you can use the ssmlToSpeech transformer to transform the text to speech and strip SSML tags from an SSML expression. These transformers cannot be used with the audio tag.

ssmlToSpeech and ssmlToText transformers

Property

Type

Required

Description

transformer

enum: ssmlToSpeech | ssmlToText

Yes

The type of transformation required. Initially, two transformers will be available: 1) ssmlToSpeech converts a data source value to a text-to-speech URL, and 2) ssmlToText converts an SSML expression to plain text by stripping out any SSML tags.

inputPath

string

Yes

The path of the data source value that needs to be transformed.

outputName

string

No

The name of the data source property where the transformed output will be stored. This output property will always be a sibling of the input property. If an outputName isn't provided, the value in the inputPath will be replaced with the output of the transformer.

The following sample APL document shows a version of a "Cat Facts" skill that associates speech with a Text component bound to a cat fact. The Text component is wrapped in a ScrollView component. This means the device will automatically scroll to the parts of the cat fact that aren't visible on screen as they are spoken.

Part of an APL document that shows a Text component that binds to speech

In this snippet, the transformed data source is now set to the device.

Transformed data source received by the device

{"datasources":{"catFactData":{"type":"object","properties":{"backgroundImage":"https://.../catfacts.png","title":"Cat Fact #9","logoUrl":"https://.../logo.png","image":"https://.../catfact9.png","catFactSsml":"<speak>Not all cats like <emphasis level='strong'>catnip</emphasis>.</speak>","catFactSpeech":"https://tinyurl.amazon.com/aaaaaa/catfact.mp3","catFact":"Not all cats like catnip."**}}}}

To read the cat fact, you must use the Alexa.Presentation.APL.ExecuteCommands directive with the SpeakItem command. The next snippet shows the Alexa.Presentation.APL.ExecuteCommands directive that you can use to read the cat fact. The token supplied in the ExecuteCommands directive is required, and must match the token provided by the skill in the RenderDocument directive used to render the APL document.

An Alexa.Presentation.APL.ExecuteCommands skill directive with a SpeakItem command