How to use iOS 10 Speech Recognition API to convert Voice to Text

Speech Recognition is transcription of human speech or audio to text. The system should be able to recognize and translate the spoken language of the speaker to text format. It is also known as ,”Computer Speech Recognition” or “Automatic Speech Recognition(ASR)”. In iOS 10, Apple introduced Speech Recognition API, a new framework that allows apps to support continuous speech recognition from either live or prerecorded audio and transcribe it into text. Using Speech framework, apps can use the speech recognition API of Apple and extend this feature into their services.

Why use Speech Recognition API

Prior to iOS 10, Apple allowed users to interact with the device through speech only via Siri(Apple voice-controlled personal assistant) and Keyboard dictation-enabled by tapping the microphone button left of the space bar in the keyboard.

Moreover, keyboard dictation was the only way for the developers to allow users to interact with an application by using the default iOS keyboard. However, there are many limitations with this feature.

It is only available through user interface elements that support TextKit

Limited to live audio

Supports only system’s default keyboard language

Most importantly, it lacks additional information such as confidence intervals, timing, and alternate interpretations.

Speech Framework provides us with a more powerful way to integrate the speech recognition capabilities of Apple and gives fast and accurate results in real time. It provides more information about the results in addition to transcription of speech to text. Some of the benefits include

Supports both pre-recorded audio and live-speech

Multiple interpretations of the speech

Confidence levels

Timing information

The entire process of speech translation into text is handled by the Apple servers, which requires for the device to have an active internet connection.

Features of iOS Speech Recognition API

Uses same technology as Siri and keyboard dictation.

Highly accurate

Adapts to the user(Individual preferences)

Supports over 50 languages and dialects.

Protects user privacy

How to configure your app to support Speech Recognition

First and foremost, the developer has to make sure that the speech recognition is available for a given language at the current time by adopting the SFSpeechRecognizerDelegate protocol. Since speech recognition requires user data to be send to the servers and stored, it is important to respect the user privacy and should get explicit permission from the user.

App must request the user permission to access the device microphone and speech recognition. Provide a string in NSSpeechRecognitionUsageDescription key in the app’s info.plist, which explains the user as to why speech recognition is used by the app. Also,include an usage description string for NSMicrophoneUsageDescription key to access the device microphone.

It should be noted that failure to provide these required keys will result in termination of app by the system. When the app uses speech recognition for the first time, the aforementioned string will be shown to the user as an alert. If the user grants the permission, the app is ready to process the request.