Yandex SpeechKit Mobile SDK 3.12.2 for Android reference guide

Yandex SpeechKit is a multi-platform library for integrating speech functionality in your mobile apps with minimal effort. The ultimate goal of SpeechKit is to provide users with the entire range of Yandex speech technologies.

SpeechKit architecture

The SpeechKit library supports several mobile platforms using the same implementation of the basic logic. The differences between platforms are in the platform abstraction layer (recording audio, networking, etc.), API wrappers, and platform-specific components such as GUI implementation. This approach simplifies development for multiple platforms and allows for ideal synchronization of functionality between them.

Mobile platforms differ in their culture and development practices. This affects such aspects as naming of classes and methods, object instantiation, error handling, and so on. We try to minimize these differences while also making sure that SpeechKit fits naturally into the ecosystem of each of the supported platforms.

To create an OnlineRecognizer object, specify which settings it will work with. Mandatory settings are: the language of recognized speech, the language model, and the listener that will receive messages about the recognition process. For the full list of settings, see the OnlineRecognizer.Builder class.

OnlineRecognizer requires a network connection. Because of this, it may take slightly longer to start the recognition process the first time. To avoid this, call the prepare() method in advance. It will make all the necessary settings.

Note.

If the prepare() method wasn't called explicitly, it will run automatically on the first start.

The OnlineRecognizer object can be used for repeated speech recognition. If you need to stop the recognition process before it finishes, call cancel().

Speech recognition + UI

You can also use the RecognizerActivity UI dialog to make it easier to integrate speech recognition into an app. It manages the entire recognition process, including the user interface for recognition and management of the OnlineRecognizer and PhraseSpotter objects. RecognizerActivity starts recognition immediately after opening. The dialog window closes automatically in the following cases:

The recognition result was received.

An error occurred.

The user closed or minimized the app.

The dialog handles when the screen is rotated, the app is minimized, and any other events that may affect the appearance of the dialog or the behavior of the OnlineRecognizer object.

You can get recognition results using the standard Android method: implement the callback method onActivityResult() in the activity class. It accepts the activity exit code and the data that is passed by the finished activity. If the activity exit code is RecognizerActivity.RESULT_OK, the data will contain the recognition result:

To create an OnlineRecognizer object, specify which settings it will work with. Mandatory settings: the language of the synthesized speech and the listener that will receive messages about the speech synthesis process. For the full list of settings, see the OnlineVocalizer.Builder class.

OnlineVocalizer requires a network connection. Because of this, it may take slightly longer to start the speech synthesis process the first time. To avoid this, call the prepare() method in advance. It will make all the necessary settings.

Note.

If the prepare() method wasn't called explicitly, it will be executed automatically at the time of the first speech synthesis.

Speech synthesis of the transmitted text. Asynchronous execution.

To get speech synthesis results and monitor changes in the state of the OnlineVocalizer object, implement the VocalizerListener interface. Main methods of the interface:

onPartialSynthesis — Notifies when partial synthesis results are received. Depending on the task, you can save them to a file or play them using the built-in player.

The OnlineVocalizer object can be used for repeated speech synthesis. If you need to end the speech synthesis or vocalization process before it finishes, call the cancel() method.

Voice activation

Voice activation uses the PhraseSpotter object. Voice activation detects a specific word or phrase in the incoming stream for speech recognition. The activation phrase is set in the PhraseSpotter class object.

To create the PhraseSpotter object, specify which settings it will work with. Required settings are the path to the model for the PhraseSpotter object and the listener that will receive notifications about the voice activation process. For the full list of settings, see the PhraseSpotter.Builder class.

PhraseSpotter does not require a network connection, but it may take some time to load the model. To avoid this, call the prepare() method in advance.

Note.

If the prepare() method wasn't called explicitly, it will run automatically on the first start.

Starting the work of the PhraseSpotter object. Asynchronous execution.

To get voice activation results and monitor changes in the state of the PhraseSpotter object, implement the PhraseSpotterListener interface. Main methods of the interface:

After the specified phrase is detected, the PhraseSpotter object continues working. To stop it, call stop().

Need help?

If you experience problems with the SpeechKit Mobile SDK, try enabling logging using the setLogLevel method of the BaseSpeechKit class. This will provide additional information about what is happening with the system at the moment, and may help you answer any questions you might have.

SpeechKit.getInstance().setLogLevel(SpeechKit.LogLevel.LOG_DEBUG);

If the logs don't give you enough information, search the FAQ for an answer to your question or a description of a similar problem and solution.