Introduction and Installation

Introduction

OpenEars® is a shared-source iOS framework for iPhone voice recognition and speech synthesis (TTS). It lets you easily implement local, offline speech recognition in English and five other languages, and English text-to-speech (synthesized speech). OpenEars works on the iPhone, iPod and iPad and uses the open source CMU Sphinx project. OpenEars is free to use in an iPhone, iPad or iPod app. It is the most popular offline framework for speech recognition and speech synthesis on iOS and has been featured in development books such as O'Reilly's Basic Sensors in iOS by Alasdair Allan and Cocos2d for iPhone 1 Game Development Cookbook by Nathan Burba among many other places.

The OpenEars Platform is also a complete development platform for creating your speech recognition and text-to-speech apps including both the free OpenEars SDK documented on this page and a diverse set of plugins that can be added to OpenEars in order to extend and refine its default features: you can read more about the OpenEars platform here. This page is all about the free and shared-source OpenEars SDK, to please read on to learn more about it.

Highly-accurate large-vocabulary recognition (that is, trying to recognize any word the user speaks out of many thousands of known words) is not yet a reality for local in-app processing on a small handheld device given the hardware limitations of the platform; even Siri does its large-vocabulary recognition on the server side. However, Pocketsphinx (the open source voice recognition engine that OpenEars uses) is capable of local recognition of vocabularies with hundreds or even thousands of words depending on the environment and other factors, and performs very well with medium-sized language models (vocabularies). The best part is that it uses no network connectivity because all processing occurs locally on the device.

The current version of OpenEars is 2.507. Download OpenEars or read its changelog. If you are upgrading to OpenEars 2.x from a 1.x version, it is necessary to follow the upgrade guide once in order to successfully upgrade. If you are upgrading from OpenEars 2.0x to OpenEars 2.5x, it is very easy but there are brief instructions in the upgrade guide that will give you a smooth transition.

Features of OpenEars

OpenEars can:

Perform speech recognition in English and in six other languages found on the languages download page including Chinese, German, French, Spanish, Italian, and Dutch.

Perform text-to-speech (synthesized speech) in English and with the NeatSpeech plugin, can also perform text-to-speech in Spanish

Listen continuously for speech on a background thread, while suspending or resuming speech processing on demand, all while using less than 2% CPU on average on current devices (decoding speech, text-to-speech, updating the UI and other intermittent functions use more CPU),

Change the pitch, speed and variance of any text-to-speech voice,

Know whether headphones are plugged in and continue voice recognition during text-to-speech only when they are plugged in,

Support bluetooth audio devices (experimental),

Dispatch information to any part of your app about the results of speech recognition and speech, or changes in the state of the audio session (such as an incoming phone call or headphones being plugged in),

Deliver level metering for both speech input and speech output so you can design visual feedback for both states.

Support JSGF grammars with an easy-to-use human-readable grammar specification language, only from Politepix,

Be installed in a Cocoa-standard fashion using an easy-peasy already-compiled framework.

In addition to its various new features and faster recognition/text-to-speech responsiveness, OpenEars now has improved recognition accuracy.

OpenEars is free to use in an App Store app.

Warning

Before using OpenEars, please note it has to use a different audio driver on the Simulator that is less accurate, so it is always necessary to evaluate accuracy on a real device. Please don't submit support requests for accuracy issues with the Simulator.

Installation

Create your own app, and then add the iOS frameworks AudioToolbox and AVFoundation to it.

Inside your downloaded distribution there is a folder called "Framework". Drag the "Framework" folder into your app project in Xcode.

OK, now that you've finished laying the groundwork, you have to...wait, that's everything. You're ready to start using OpenEars. Give the sample app a spin to try out the features and then visit the Politepix interactive tutorial generator for a customized tutorial showing you exactly what code to add to your app for all of the different functionality of OpenEars.

Basic concepts

There are a few basic concepts to understand about voice recognition and OpenEars that will make it easiest to create an app.

Local or offline speech recognition versus server-based or online speech recognition: most speech recognition on the iPhone, iPod and iPad is done by streaming the speech audio to servers. OpenEars works by doing the recognition inside the device, entirely offline without using the network. This saves bandwidth and results in faster response, but since a server is much more powerful than a phone it means that we have to work with much smaller vocabularies to get accurate recognition.

Language Models. The language model is the vocabulary that you want OpenEars to understand, in a format that its speech recognition engine can understand. The smaller and better-adapted to your users' real usage cases the language model is, the better the accuracy. An good language model for PocketsphinxController has fewer than 1000 words. You define the words that your app uses - it will not know about vocabulary other than the vocabulary that you define.

The parts of OpenEars. OpenEars has a simple, flexible and very powerful architecture.

OEAcousticModel Class Reference

Detailed Description

Convenience class for accessing the acoustic model bundles. All this does is allow you to reference your chosen model by including this header in your class and then letting you call [OEAcousticModel pathToModel:"AcousticModelEnglish"] or [OEAcousticModel pathToModel:@"AcousticModelSpanish"] (or other names, replacing the name of the model with the name of the model you are using, minus its ".bundle" suffix) in any of the methods which ask for a path to an acoustic model.

Method Documentation

+ (NSString *) pathToModel:

(NSString *)

acousticModelBundleName

Swift 3:

path(toModel: String!)

Reference the path to any acoustic model bundle you've dragged into your project (such as AcousticModelSpanish.bundle or AcousticModelEnglish.bundle) by calling this class method like [OEAcousticModel pathToModel:"AcousticModelEnglish"] after importing this class.

OEContinuousModel Class Reference

OEEventsObserver Class Reference

Detailed Description

OEEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OEEventsObservers as you need and receive information using them simultaneously. All of the documentation for the use of OEEventsObserver is found in the section OEEventsObserverDelegate.

OEFliteController Class Reference

Detailed Description

The class that controls speech synthesis (TTS) in OpenEars.

Usage examples

Preparing to use the class:

To use OEFliteController, you need to have at least one Flite voice added to your project. When you added the "framework" folder of OpenEars to your app, you already imported a voice called Slt, so these instructions will use the Slt voice.

What to add to your header:

Add the following lines to your header (the .h file). Under the imports at the very top:

#import <Slt/Slt.h>
#import <OpenEars/OEFliteController.h>

Add these class properties to the other properties of your view controller or object:

Add the following to your implementation (the .m file):
Before you want to use TTS speech in your app, instantiate an OEFliteController and a voice as follows (perhaps in your view controller's viewDidLoad method):

After having initialized your OEFliteController, add the following message in a method where you want to call speech:

[self.fliteController say:@"A short statement" withVoice:self.slt];

Warning

There can only be one OEFliteController instance in your app at any given moment. If TTS speech is initiated during a live OEPocketsphinxController listening loop and the speaker is the audio output, listening will be suspended (so the TTS speech isn't recognized) and then resumed on TTS speech completion. If you have already suspended listening manually, you will need to suspend it again when OEFliteController is done speaking.

Method Documentation

- (void) say:

(NSString *)

statement

withVoice:

(OEFliteVoice *)

voiceToUse

Swift 3:

say(statement: String!, with: OEFliteVoice!)

This takes an NSString which is the word or phrase you want to say, and the OEFliteVoice to use to say the phrase. Usage Example:

OELanguageModelGenerator Class Reference

Detailed Description

Usage examples

What to add to your implementation:

In offline speech recognition, you define the vocabulary that you want your app to be able to recognize. This is called a language model or grammar (you can read more about these options in the OELanguageModelGenerator documentation). A good vocabulary size for an offline speech recognition app on the iPhone, iPod or iPad is between 10 and 500 words.
Add the following to your implementation (the .m file):
Under the @interface keyword at the top:

In the method where you want to create your language model (for instance your viewDidLoad method), add the following method call (replacing the placeholders like "WORD" and "A PHRASE" with actual words and phrases you want to be able to recognize):

Generate a probabilistic language model from an array of NSStrings which are the words and phrases you want OEPocketsphinxController or OEPocketsphinxController+RapidEars to understand, using your chosen acoustic model.

Putting a phrase in as a string makes it somewhat more probable that the phrase will be recognized as a phrase when spoken. If you only ever want certain phrases or word sequences to be recognized at the exclusion of other combinations, use - (NSError *) generateGrammarFromDictionary:(NSDictionary *)grammarDictionary withFilesNamed:(NSString *)fileName forAcousticModelAtPath:(NSString *)acousticModelPath below instead to create a rules-based grammar instead of a probabilistic language model.

fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Caches directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP. Please give your language models unique names within your session if you want to switch between them, so there is no danger of the engine getting confused between new and old models and dictionaries at the time of switching.

If your input text has numbers such as '1970' or '3' you should spell them out ("Nineteen-seventy", or alternately "One Thousand One Hundred Seventy", or "Three" in a contextually-appropriate way before submitting them to get the most accurate results. This can't be done automatically for you yet and at the moment numbers will trigger the fallback technique, which will only take a best guess at the intention with no alternate pronunciations and give sub-optimal recognition results where the guess is incorrect.

Additionally, if there are ambiguous symbols in your text such as '$' or '+' they will be removed from the text, as it is not possible to reliably detect the context or intention for these symbols or whether they are even intended to be transcribed at all. Therefore if you intend for them to be spoken or synthesized in your app interface, please replace them with spelled-out forms of the same symbol, e.g. "dollars" or "dollar" for '$' and "plus" or "and" for '+', and for all other similar types of symbols found in your text.

If you are feeding in arbitrary text and experiencing unexpected results in terms of what is recognized or accuracy rates, please investigate your text for symbols and numbers which are (unavoidably) being transformed by OELanguageModelGenerator and transcribe them yourself for best results. Alphabetical characters and apostrophes and hyphens which appear in a word, as well as sentence ending symbols and clause-separating symbols, will remain intact.

OELanguageModelGenerator no longer has any case preference when inputting text, so you don't have to be concerned about whether your input is capitalized or not; you only have to pay attention in your own app implementation that phrases you are trying to detect are matchable against the case you actually used to create your model using this class.

If this method is successful it will return nil. If it returns nil, you can use the methods pathToSuccessfullyGeneratedDictionaryWithRequestedName: and pathToSuccessfullyGeneratedLanguageModelWithRequestedName: or pathToSuccessfullyGeneratedGrammarWithRequestedName: to get your paths to your newly-generated language models and grammars and dictionaries for use with OEPocketsphinxController. If it doesn't return nil, it will return an error which you can check for debugging purposes.

If generateLanguageModelFromArray:withFilesNamed:forAcousticModelAtPath: does not return an error, you can use this method to receive the full path to your generated phonetic dictionary for use with OEPocketsphinxController. Swift 3 pathToSuccessfullyGeneratedDictionary(withRequestedName: String!)

If generateLanguageModelFromArray:withFilesNamed:forAcousticModelAtPath: does not return an error, you can use this method to receive the full path to your generated language model for use with OEPocketsphinxController. Swift 3 pathToSuccessfullyLanguageModel(withRequestedName: String!)

- (NSString *) pathToSuccessfullyGeneratedGrammarWithRequestedName:

(NSString *)

name

If generateLanguageModelFromArray:withFilesNamed:forAcousticModelAtPath: does not return an error, you can use this method to receive the full path to your generated grammar for use with OEPocketsphinxController. Swift 3 pathToSuccessfullyGeneratedGrammar(withRequestedName: String!)

Dynamically generate a JSGF grammar using OpenEars' natural language system for defining a speech recognition ruleset. This will recognize exact phrases instead of probabilistically recognizing word combinations in any sequence.

The NSDictionary you submit to the argument generateGrammarFromDictionary: is a key-value pair consisting of an NSArray of words stored in NSStrings indicating the vocabulary to be listened for, and an NSString key which is one of the following #defines from GrammarDefinitions.h, indicating the rule for the vocabulary in the NSArray:

Breaking them down one at a time for their specific meaning in defining a rule:

ThisWillBeSaidOnce // This indicates that the word or words in the array must be said (in sequence, in the case of multiple words), one time.
ThisCanBeSaidOnce // This indicates that the word or words in the array can be said (in sequence, in the case of multiple words), one time, but can also be omitted as a whole from the utterance.
ThisWillBeSaidWithOptionalRepetitions // This indicates that the word or words in the array must be said (in sequence, in the case of multiple words), one time or more.
ThisCanBeSaidWithOptionalRepetitions // This indicates that the word or words in the array can be said (in sequence, in the case of multiple words), one time or more, but can also be omitted as a whole from the utterance.
OneOfTheseWillBeSaidOnce // This indicates that exactly one selection from the words in the array must be said one time.
OneOfTheseCanBeSaidOnce // This indicates that exactly one selection from the words in the array can be said one time, but that all of the words can also be omitted from the utterance.
OneOfTheseWillBeSaidWithOptionalRepetitions // This indicates that exactly one selection from the words in the array must be said, one time or more.
OneOfTheseCanBeSaidWithOptionalRepetitions // This indicates that exactly one selection from the words in the array can be said, one time or more, but that all of the words can also be omitted from the utterance.

Since an NSString in these NSArrays can also be a phrase, references to words above should also be understood to apply to complete phrases when they are contained in a single NSString.

A key-value pair can also have NSDictionaries in the NSArray instead of NSStrings, or a mix of NSStrings and NSDictionaries, meaning that you can nest rules in other rules.

Here is an example of a complex rule which can be submitted to the generateGrammarFromDictionary: argument followed by an explanation of what it means:

Breaking it down step by step to explain exactly what the contents mean:

@{
ThisWillBeSaidOnce : @[ // This means that a valid utterance for this ruleset will obey all of the following rules in sequence in a single complete utterance:
@{ OneOfTheseCanBeSaidOnce : @[@"HELLO COMPUTER", @"GREETINGS ROBOT"]}, // At the beginning of the utterance there is an optional statement. The optional statement can be either "HELLO COMPUTER" or "GREETINGS ROBOT" or it can be omitted.
@{ OneOfTheseWillBeSaidOnce : @[@"DO THE FOLLOWING", @"INSTRUCTION"]}, // Next, an utterance will have exactly one of the following required statements: "DO THE FOLLOWING" or "INSTRUCTION".
@{ OneOfTheseWillBeSaidOnce : @[@"GO", @"MOVE"]}, // Next, an utterance will have exactly one of the following required statements: "GO" or "MOVE"
@{ThisWillBeSaidWithOptionalRepetitions : @[ // Next, an utterance will have a minimum of one statement of the following nested instructions, but can also accept multiple valid versions of the nested instructions:
@{ OneOfTheseWillBeSaidOnce : @[@"10", @"20",@"30"]}, // Exactly one utterance of either the number "10", "20" or "30",
@{ OneOfTheseWillBeSaidOnce : @[@"LEFT", @"RIGHT", @"FORWARD"]} // Followed by exactly one utterance of either the word "LEFT", "RIGHT", or "FORWARD".
]},
@{ OneOfTheseWillBeSaidOnce : @[@"EXECUTE", @"DO IT"]}, // Next, an utterance must contain either the word "EXECUTE" or the phrase "DO IT",
@{ ThisCanBeSaidOnce : @[@"THANK YOU"]} and there can be an optional single statement of the phrase "THANK YOU" at the end.
]
};

So as examples, here are some sentences that this ruleset will report as hypotheses from user utterances:

"HELLO COMPUTER DO THE FOLLOWING GO 20 LEFT 30 RIGHT 10 FORWARD EXECUTE THANK YOU"
"GREETINGS ROBOT DO THE FOLLOWING MOVE 10 FORWARD DO IT"
"INSTRUCTION 20 LEFT 20 LEFT 20 LEFT 20 LEFT EXECUTE"

But it will not report hypotheses for sentences such as the following which are not allowed by the rules:

"HELLO COMPUTER HELLO COMPUTER"
"MOVE 10"
"GO RIGHT"

Since you as the developer are the designer of the ruleset, you can extract the behavioral triggers from your app from hypotheses which observe your rules.

The words and phrases in languageModelArray must be written with capital letters exclusively, for instance "word" must appear in the array as "WORD".

The last two arguments of the method work identically to the equivalent language model method. The withFilesNamed: argument takes an NSString which is the naming you would like for the files output by this method. Please give your grammars unique names within your session if you want to switch between them, so there is no danger of the engine getting confused between new and old grammars and dictionaries at the time of switching. The argument acousticModelPath takes the path to the relevant acoustic model.

This method returns an NSError, which will either return an error code or it will return noErr with an attached userInfo NSDictionary containing the paths to your newly-generated grammar (a .gram file) and corresponding phonetic dictionary (a .dic file). Remember that when you are passing .gram files to the Pocketsphinx method:

Generate a language model from a text file containing words and phrases you want OEPocketsphinxController to understand, using your chosen acoustic model. The file should be formatted with every word or contiguous phrase on its own line with a line break afterwards. Putting a phrase in on its own line makes it somewhat more probable that the phrase will be recognized as a phrase when spoken.

Give the correct full path to the text file as a string. fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Caches directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP.

If this method is successful it will return nil. If it returns nil, you can use the methods pathToSuccessfullyGeneratedDictionaryWithRequestedName: and pathToSuccessfullyGeneratedLanguageModelWithRequestedName: to get your paths to your newly-generated language models and grammars and dictionaries for use with OEPocketsphinxController. If it doesn't return nil, it will return an error which you can check for debugging purposes.

Property Documentation

OELogging Class Reference

Detailed Description

A singleton which turns logging on or off for the entire framework. The type of logging is related to overall framework functionality such as the audio session and timing operations. Please turn OELogging on for any issue you encounter. It will probably show the problem, but if not you can show the log on the forum and get help.

OEPocketsphinxController Class Reference

Detailed Description

The class that controls local speech recognition in OpenEars.

Usage examples

What to add to your header:

To use OEPocketsphinxController, the class which performs speech recognition, you need a language model and a phonetic dictionary for it. These files define which words OEPocketsphinxController is capable of recognizing. We just created them above by using OELanguageModelGenerator. You also need an acoustic model. OpenEars ships with an English and a Spanish acoustic model.

First, add the following to your implementation (the .m file): Under the @implementation keyword at the top:

Start the speech recognition engine up. You provide the full paths to a language model and a dictionary file which are created using OELanguageModelGenerator and the acoustic model you want to use, for instance [OEAcousticModel pathToModel:"AcousticModelEnglish"] or in Swift 3 OEAcousticModel.path(toModel: "AcousticModelEnglish").

Resume listening for speech after suspendRecognition has been called. Swift 3: resumeRecognition

- (void) changeLanguageModelToFile:

(NSString *)

languageModelPathAsString

withDictionary:

(NSString *)

dictionaryPathAsString

Change from one language model to another. This lets you change which words you are listening for depending on the context in your app. If you have already started the recognition loop and you want to switch to a different language model, you can use this and the model will be changed at the earliest opportunity. Will not have any effect unless recognition is already in progress. It isn't possible to change acoustic models in the middle of an already-started listening loop, just language model and dictionary. Swift 3: changeLanguageModel(toFile: String!, withDictionary: String!)

- (void) runRecognitionOnWavFileAtPath:

(NSString *)

wavPath

usingLanguageModelAtPath:

(NSString *)

languageModelPath

dictionaryAtPath:

(NSString *)

dictionaryPath

acousticModelAtPath:

(NSString *)

acousticModelPath

languageModelIsJSGF:

(BOOL)

languageModelIsJSGF

You can use this to run recognition on an already-recorded WAV file for testing. The WAV file has to be 16-bit and 16000 samples per second. Swift 3: runRecognitionOnWavFile(atPath: String!, usingLanguageModelAtPath: String!, dictionaryAtPath: String!, acousticModelAtPath: String!, languageModelIsJSGF: Bool)

- (void) requestMicPermission

You can use this to request mic permission in advance of running speech recognition. Swift 3: requestMicPermission

This needs to be called with the value TRUE before setting properties of OEPocketsphinxController for the first time in a session, and again before using OEPocketsphinxController in case it has been called with the value FALSE. Swift 3: setActive(active: Bool) (enclose in try/catch)

Property Documentation

- (Float32) pocketsphinxInputLevel

readatomicassign

Gives the volume of the incoming speech. This is a UI hook. You can't read it on the main thread or it will block.

- (BOOL) micPermissionIsGranted

readatomicassign

Returns whether your app has record permission. This is expected to be used after the user has at some point been prompted with requestMicPermission and the result has come back in the permission results OEEventsObserver delegate methods. If this is used before that point, accuracy of results are not guaranteed. If the user has either granted or denied permission in the past, this will return a boolean indicating the permission state.

- (float) secondsOfSilenceToDetect

readwritenonatomicassign

This is how long OEPocketsphinxController should wait after speech ends to attempt to recognize speech. This defaults to .7 seconds.

- (BOOL) returnNbest

readwritenonatomicassign

Advanced: set this to TRUE to receive n-best results.

- (int) nBestNumber

readwritenonatomicassign

Advanced: the number of n-best results to return. This is a maximum number to return – if there are null hypotheses fewer than this number will be returned.

- (BOOL) verbosePocketSphinx

readwritenonatomicassign

Turn on extended logging for speech recognition processes. In order to get assistance with a speech recognition issue in the forums, it is necessary to turn this on and show the output.

- (BOOL) returnNullHypotheses

readwritenonatomicassign

By default, OEPocketsphinxController won't return a hypothesis if for some reason the hypothesis is null (this can happen if the perceived sound was just noise). If you need even empty hypotheses to be returned, you can set this to TRUE before starting OEPocketsphinxController.

- (BOOL) isSuspended

readwritenonatomicassign

Check if the listening loop is suspended

- (BOOL) isListening

readwritenonatomicassign

Check if the listening loop is in progress

- (BOOL) legacy3rdPassMode

readwritenonatomicassign

Set this to true if you encounter unusually slow-to-return searches with Rejecto

- (BOOL) removingNoise

readwritenonatomicassign

Try not to decode probable noise as speech (this can result in more noise robustness, but it can also result in omitted segments – defaults to YES, override to set to NO)

- (BOOL) removingSilence

readwritenonatomicassign

Try not to decode probable silence as speech (this can result in more accuracy, but it can also result in omitted segments – defaults to YES, override to set to NO)

- (float) vadThreshold

readwritenonatomicassign

Speech/Silence threshhold setting. You may not need to make any changes to this, however, if you are experiencing quiet background noises triggering speech recognition, you can raise this to a value from 2-5 to 3.5 for the English acoustic model, and between 3.0 and 4.5 for the Spanish acoustic model. If you are experiencing too many words being ignored you can reduce this. The maximum value is 5.0 and the minimum is .5. For the English model, values less than 1.5 or more than 3.5 are likely to lead to poor results. For the Spanish model, higher values can be used. Please test any changes here carefully to see what effect they have on your user experience.

- (BOOL) disableBluetooth

readwritenonatomicassign

Optionally disable bluetooth support for a listening session in case you never want bluetooth to be an audio route. Only set TRUE if you are sure you want this; defaults to FALSE (meaning that the default audio session supports bluetooth as a route unless you use this to declare otherwise).

- (BOOL) disableMixing

readwritenonatomicassign

Optionally disable audio session mixing. Only set TRUE if you are sure you want this; defaults to FALSE (meaning that the default audio session mode is with mixing enabled unless you use this to declare otherwise).

- (BOOL) disableSessionResetsWhileStopped

readwritenonatomicassign

Optionally disable resets of the audio session when listening is not in progress. Set TRUE if you are experiencing undesired results from automatic resets of the audio session while listening is not in progress.

- (BOOL) disablePreferredSampleRate

readwritenonatomicassign

Optionally disable preferred hardware sample rate. This should be left alone other than in the specific cases that you want to play back higher sample rate material while OpenEars has the audio session or you have discovered it results in better 3rd-party recording device support (e.g. a bluetooth device). Otherwise, it can slightly reduce accuracy so it should be left alone.

- (BOOL) disablePreferredBufferSize

readwritenonatomicassign

Optionally disable preferred buffer size. Only set this if recommended to when seeking support for issues related to unusual hardware – it has no general upsides and can reduce performance.

- (BOOL) disablePreferredChannelNumber

readwritenonatomicassign

Optionally disable preferred channels numbers. Only set this if recommended to when seeking support for issues related to unusual hardware – it has no general upsides and can reduce recognition quality

- (NSString*) audioMode

readwritenonatomiccopy

Set audio modes for the audio session manager to use. This can be set to the following:

@"Default" to use AVAudioSessionModeDefault
@"VoiceChat" to use AVAudioSessionModeVoiceChat
@"VideoRecording" AVAudioSessionModeVideoRecording
@"Measurement" AVAudioSessionModeMeasurement

If you don't set it to anything, "Default" will automatically be used.

- (NSString*) pathToTestFile

readwritenonatomiccopy

By setting pathToTestFile to point to a recorded audio file you can run the main Pocketsphinx listening loop (not runRecognitionOnWavFileAtPath but the main loop invoked by using startListeningWithLanguageModelAtPath:) over a pre-recorded audio file instead of using it with live input.

In contrast with using the method runRecognitionOnWavFileAtPath to receive a single recognition from a file, with this approach the audio file will have its buffers injected directly into the audio driver circular buffer for maximum fidelity to the goal of testing the entire codebase that is in use when doing a live recognition, including the whole driver and the listening loop including all of its features. This is for creating tests for yourself and for sharing automatically replicable issue reports with Politepix.

To use this, make an audio recording on the same device (i.e., if you are testing OEPocketsphinxController on an iPhone 5 with the internal microphone, make a recording on an iPhone 5 with the internal microphone, for instance using Apple's Voice Memos app) and then convert the resulting file to a 16-bit, 16000 sample rate, mono WAV file. You can do this with the output of Apple's Voice Memos app by taking the .m4a file that Voice Memos outputs and run it through this command in Terminal.app:

Then add the WAV file to your app, and right before sending the call to startListeningWithLanguageModelAtPath, set this property pathToTestFile to the path to your audio file in your app as an NSString (e.g. [[NSBundle mainBundle] pathForResource:"Memo" ofType:@"wav"]).

Note: when you record the audio file you will be using to test with, give it a second of quiet lead-in before speech so there is time for the engine to fully start before listening begins. If you have any difficulty getting this to work, remember to turn on OELogging to get error output, which will probably explain what is not working.

SmartCMN is disabled during testing so that the test gets the same results when run for different people and for different devices. Please keep in mind that there are some settings in Pocketsphinx which may prevent a deterministic outcome from a recognition, meaning that you should expect a similar score over multiple runs of a test but you may not always see the identical score. There are examples of asynchronous testing using this tool in this project in the test target.

- (BOOL) useSmartCMNWithTestFiles

readwritenonatomicassign

If you are doing testing, you can toggle SmartCMN on or off (it defaults to off and should usually be left off since using it can lead to nondeterministic results on the first runs with new devices).

<OEEventsObserverDelegate> Protocol Reference

abstract

Detailed Description

OEEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OEEventsObservers as you need and receive information using them simultaneously.

Usage examples

What to add to your header:

OEEventsObserver is the class which keeps you continuously updated about the status of your listening session, among other things, via delegate callbacks.
Add the following lines to your header (the .h file). Under the imports at the very top:

#import <OpenEars/OEEventsObserver.h>

at the @interface declaration, add the OEEventsObserverDelegate inheritance.
An example of this for a view controller called ViewController would look like this:

Add the following to your implementation (the .m file):
Before you call a method of either OEFliteController or OEPocketsphinxController (perhaps in viewDidLoad), instantiate OEEventsObserver and set its delegate as follows:

The user prompt to get mic permissions, or a check of the mic permissions, has completed with a TRUE or a FALSE result (will only be returned on iOS7 or later). Swift 3: micPermissionCheckCompleted(_ result: Bool)

- (void) fliteDidStartSpeaking

optional

Flite started speaking. You probably don't have to do anything about this. Swift 3: fliteDidStartSpeaking

NeatSpeech is a plugin for OpenEars™ that lets it do fast, high-quality offline speech synthesis which is compatible with iOS6.1, and even lets you edit the pronunciations of words! Try out the NeatSpeech demo free of charge.

OpenEars® is a registered trademark of PolitepixAllHours® is a registered trademark of PolitepixThe Politepix site uses cookies in order to understand how the website is used by visitors and in order to enable some required functionality. You can learn all about which cookies we use on the About page, as well as everything about our privacy policy.TWITTER | CONTACT POLITEPIX | IMPRESSUM | ABOUT | LEGAL | IMPRINT