OpenEars™ – iPhone Voice Recognition and Text-To-Speech

OpenEars™: free speech recognition and speech synthesis for the iPhone

OpenEars™ makes it simple for you to add offline speech recognition and synthesized speech/TTS to your iPhone app quickly and easily. It lets everyone get the great results of using advanced speech UI concepts like statistical language models and finite state grammars in their app, but with no more effort than creating an NSArray or NSDictionary.

It doesn't use the network and there are no hidden costs or accounts to set up. If you have more specific app requirements, the OpenEars™ Plugin Platform lets you drag and drop advanced functionality into your app when you're ready. OpenEars™ 2 is out now.

Introduction and Installation

Introduction

OpenEars™ is an shared-source iOS framework for iPhone voice recognition and speech synthesis (TTS). It lets you easily implement round-trip English and Spanish language speech recognition and English text-to-speech on the iPhone, iPod and iPad and uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries, and it is free to use in an iPhone, iPad or iPod app (Spanish text-to-speech is possible on the OpenEars™ Platform but requires using NeatSpeech since there isn't a Spanish voice for Flite). It is the most popular offline framework for speech recognition and speech synthesis on iOS and has been featured in development books such as O'Reilly's Basic Sensors in iOS by Alasdair Allan and Cocos2d for iPhone 1 Game Development Cookbook by Nathan Burba.

The OpenEars™ Platform is also a complete development platform for creating your speech recognition and text-to-speech apps including both the free OpenEars™ SDK documented on this page and a diverse set of plugins that can be added to OpenEars™ in order to extend and refine its default features: you can read more about the OpenEars™ platform here. This page is all about the free and shared-source OpenEars™ SDK, to please read on to learn more about it.

Highly-accurate large-vocabulary recognition (that is, trying to recognize any word the user speaks out of many thousands of known words) is not yet a reality for local in-app processing on a small handheld device given the hardware limitations of the platform; even Siri does its large-vocabulary recognition on the server side. However, Pocketsphinx (the open source voice recognition engine that OpenEars™ uses) is capable of local recognition of vocabularies with hundreds of words depending on the environment and other factors, and performs very well with command-and-control language models in English and Spanish. The best part is that it uses no network connectivity because all processing occurs locally on the device.

The current version of OpenEars™ is 2.041. Download OpenEars or read its changelog. If you are upgrading to OpenEars™ 2.x from a 1.x version, it is necessary to follow the upgrade guide once in order to successfully upgrade.

Features of OpenEars™

OpenEars™ can:

Perform speech recognition in English and in Spanish

Perform text-to-speech in English and with the NeatSpeech plugin, can also perform text-to-speech in Spanish

Listen continuously for speech on a background thread, while suspending or resuming speech processing on demand, all while using less than 2% CPU on average on current devices (decoding speech, text-to-speech, updating the UI and other intermittent functions use more CPU),

Use any of 9 voices for speech, including male and female voices with a range of speed/quality level, and switch between them on the fly,

Change the pitch, speed and variance of any text-to-speech voice,

Know whether headphones are plugged in and continue voice recognition during text-to-speech only when they are plugged in,

Support bluetooth audio devices (experimental),

Dispatch information to any part of your app about the results of speech recognition and speech, or changes in the state of the audio session (such as an incoming phone call or headphones being plugged in),

Deliver level metering for both speech input and speech output so you can design visual feedback for both states.

Support JSGF grammars with an easy-to-use human-readable grammar specification language, only from Politepix,

Be installed in a Cocoa-standard fashion using an easy-peasy already-compiled framework.

In addition to its various new features and faster recognition/text-to-speech responsiveness, OpenEars™ now has improved recognition accuracy.

OpenEars™ is free to use in an iPhone or iPad app.

Warning

Before using OpenEars™, please note it has to use a different audio driver on the Simulator that is less accurate, so it is always necessary to evaluate accuracy on a real device. Please don't submit support requests for accuracy issues with the Simulator.

Installation

Create your own app, and then add the iOS frameworks AudioToolbox and AVFoundation to it.

Inside your downloaded distribution there is a folder called "Framework". Drag the "Framework" folder into your app project in Xcode.

OK, now that you've finished laying the groundwork, you have to...wait, that's everything. You're ready to start using OpenEars™. Give the sample app a spin to try out the features (the sample app uses ARC so you'll need a recent Xcode version) and then visit the Politepix interactive tutorial generator for a customized tutorial showing you exactly what code to add to your app for all of the different functionality of OpenEars™.

Basic concepts

There are a few basic concepts to understand about voice recognition and OpenEars™ that will make it easiest to create an app.

Local or offline speech recognition versus server-based or online speech recognition: most speech recognition on the iPhone, iPod and iPad is done by streaming the speech audio to servers. OpenEars™ works by doing the recognition inside the device, entirely offline without using the network. This saves bandwidth and results in faster response, but since a server is much more powerful than a phone it means that we have to work with much smaller vocabularies to get accurate recognition.

Language Models. The language model is the vocabulary that you want OpenEars™ to understand, in a format that its speech recognition engine can understand. The smaller and better-adapted to your users' real usage cases the language model is, the better the accuracy. An good language model for PocketsphinxController has fewer than 500 words. You define the words that your app uses - it will not know about vocabulary other than the vocabulary that you define.

The parts of OpenEars™. OpenEars™ has a simple, flexible and very powerful architecture.

OEAcousticModel Class Reference

Detailed Description

Convenience class for accessing the acoustic model bundles. All this does is allow you to reference your chosen model by including this header in your class and then letting you call [OEAcousticModel pathToModel:"AcousticModelEnglish"] or [OEAcousticModel pathToModel:@"AcousticModelSpanish"] in any of the methods which ask for a path to an acoustic model.

Method Documentation

+ (NSString *) pathToModel:

(NSString *)

acousticModelBundleName

Reference the path to any acoustic model bundle you've dragged into your project (such as AcousticModelSpanish.bundle or AcousticModelEnglish.bundle) by calling this class method like [OEAcousticModel pathToModel:"AcousticModelEnglish"] after importing this class.

OEContinuousModel Class Reference

OEEventsObserver Class Reference

Detailed Description

OEEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars™ from anywhere in your app. You can create as many OEEventsObservers as you need and receive information using them simultaneously. All of the documentation for the use of OEEventsObserver is found in the section OEEventsObserverDelegate.

OEFliteController Class Reference

Detailed Description

The class that controls speech synthesis (TTS) in OpenEars™.

Usage examples

Preparing to use the class:

To use OEFliteController, you need to have at least one Flite voice added to your project. When you added the "framework" folder of OpenEars™ to your app, you already imported a voice called Slt, so these instructions will use the Slt voice. You can get eight more free voices in OpenEarsExtras, available at https://bitbucket.org/Politepix/openearsextras

What to add to your header:

Add the following lines to your header (the .h file). Under the imports at the very top:

#import <Slt/Slt.h>
#import <OpenEars/OEFliteController.h>

Add these class properties to the other properties of your view controller or object:

Add the following to your implementation (the .m file):
Before you want to use TTS speech in your app, instantiate an OEFliteController and a voice as follows (perhaps in your view controller's viewDidLoad method):

There are a total of nine FliteVoices available for use with OpenEars. The Slt voice is the most popular one and it ships with OpenEars. The other eight voices can be downloaded as part of the OpenEarsExtras package available at the URL http://bitbucket.org/Politepix/openearsextras. To use them, just drag the desired downloaded voice's framework into your app, import its header at the top of your calling class (e.g. import <Slt/Slt.h> or import <Rms/Rms.h>) and instantiate it as you would any other object, then passing the instantiated voice to this method.

Property Documentation

- (Float32) fliteOutputLevel

readatomicassign

A read-only attribute that tells you the volume level of synthesized speech in progress. This is a UI hook. You can't read it on the main thread.

- (float) duration_stretch

readwritenonatomicassign

duration_stretch changes the speed of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.

- (float) target_mean

readwritenonatomicassign

target_mean changes the pitch of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.

- (float) target_stddev

readwritenonatomicassign

target_stddev changes convolution of the voice. It is on a scale of 0.0-2.0 where 1.0 is the default.

- (BOOL) userCanInterruptSpeech

readwritenonatomicassign

Set userCanInterruptSpeech to TRUE in order to let new incoming human speech cut off synthesized speech in progress.

OELanguageModelGenerator Class Reference

Detailed Description

Usage examples

What to add to your implementation:

In offline speech recognition, you define the vocabulary that you want your app to be able to recognize. This is called a language model or grammar (you can read more about these options in the OELanguageModelGenerator documentation). A good vocabulary size for an offline speech recognition app on the iPhone, iPod or iPad is between 10 and 500 words.
Add the following to your implementation (the .m file):
Under the @interface keyword at the top:

In the method where you want to create your language model (for instance your viewDidLoad method), add the following method call (replacing the placeholders like "WORD" and "A PHRASE" with actual words and phrases you want to be able to recognize):

It is a requirement to enter your words and phrases in all capital letters, since the model is generated against a dictionary in which the entries are capitalized (meaning that if the words in the array aren't capitalized, they will not match the dictionary and you will not have the widest variety of pronunciations understood for the word you are using).

Method Documentation

- (NSError *) generateLanguageModelFromArray:

(NSArray *)

languageModelArray

withFilesNamed:

(NSString *)

fileName

forAcousticModelAtPath:

(NSString *)

acousticModelPath

Generate a probabilistic language model from an array of NSStrings which are the words and phrases you want OEPocketsphinxController or OEPocketsphinxController+RapidEars to understand, using your chosen acoustic model.

Putting a phrase in as a string makes it somewhat more probable that the phrase will be recognized as a phrase when spoken. If you only ever want certain phrases or word sequences to be recognized at the exclusion of other combinations, use - (NSError *) generateGrammarFromDictionary:(NSDictionary *)grammarDictionary withFilesNamed:(NSString *)fileName forAcousticModelAtPath:(NSString *)acousticModelPath below instead to create a rules-based grammar instead of a probabilistic language model.

fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Caches directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP.

If your input text has numbers such as '1970' or '3' you should spell them out ("Nineteen-seventy", or alternately "One Thousand One Hundred Seventy", or "Three" in a contextually-appropriate way before submitting them to get the most accurate results. This can't be done automatically for you yet and at the moment numbers will trigger the fallback technique, which will only take a best guess at the intention with no alternate pronunciations and give sub-optimal recognition results where the guess is incorrect.

Additionally, if there are ambiguous symbols in your text such as '$' or '+' they will be removed from the text, as it is not possible to reliably detect the context or intention for these symbols or whether they are even intended to be transcribed at all. Therefore if you intend for them to be spoken or synthesized in your app interface, please replace them with spelled-out forms of the same symbol, e.g. "dollars" or "dollar" for '$' and "plus" or "and" for '+', and for all other similar types of symbols found in your text.

If you are feeding in arbitrary text and experiencing unexpected results in terms of what is recognized or accuracy rates, please investigate your text for symbols and numbers which are (unavoidably) being transformed by OELanguageModelGenerator and transcribe them yourself for best results. Alphabetical characters and apostrophes and hyphens which appear in a word, as well as sentence ending symbols and clause-separating symbols, will remain intact.

OELanguageModelGenerator no longer has any case preference when inputting text, so you don't have to be concerned about whether your input is capitalized or not; you only have to pay attention in your own app implementation that phrases you are trying to detect are matchable against the case you actually used to create your model using this class.

If this method is successful it will return nil. If it returns nil, you can use the methods pathToSuccessfullyGeneratedDictionaryWithRequestedName: and pathToSuccessfullyGeneratedLanguageModelWithRequestedName: or pathToSuccessfullyGeneratedGrammarWithRequestedName: to get your paths to your newly-generated language models and grammars and dictionaries for use with OEPocketsphinxController. If it doesn't return nil, it will return an error which you can check for debugging purposes.

If generateLanguageModelFromArray:withFilesNamed:forAcousticModelAtPath: does not return an error, you can use this method to receive the full path to your generated phonetic dictionary for use with OEPocketsphinxController.

If generateLanguageModelFromArray:withFilesNamed:forAcousticModelAtPath: does not return an error, you can use this method to receive the full path to your generated language model for use with OEPocketsphinxController.

- (NSString *) pathToSuccessfullyGeneratedGrammarWithRequestedName:

(NSString *)

name

If generateLanguageModelFromArray:withFilesNamed:forAcousticModelAtPath: does not return an error, you can use this method to receive the full path to your generated grammar for use with OEPocketsphinxController.

- (NSError *) generateGrammarFromDictionary:

(NSDictionary *)

grammarDictionary

withFilesNamed:

(NSString *)

fileName

forAcousticModelAtPath:

(NSString *)

acousticModelPath

Dynamically generate a JSGF grammar using OpenEars' natural language system for defining a speech recognition ruleset. This will recognize exact phrases instead of probabilistically recognizing word combinations in any sequence.

The NSDictionary you submit to the argument generateGrammarFromDictionary: is a key-value pair consisting of an NSArray of words stored in NSStrings indicating the vocabulary to be listened for, and an NSString key which is one of the following #defines from GrammarDefinitions.h, indicating the rule for the vocabulary in the NSArray:

Breaking them down one at a time for their specific meaning in defining a rule:

ThisWillBeSaidOnce // This indicates that the word or words in the array must be said (in sequence, in the case of multiple words), one time.
ThisCanBeSaidOnce // This indicates that the word or words in the array can be said (in sequence, in the case of multiple words), one time, but can also be omitted as a whole from the utterance.
ThisWillBeSaidWithOptionalRepetitions // This indicates that the word or words in the array must be said (in sequence, in the case of multiple words), one time or more.
ThisCanBeSaidWithOptionalRepetitions // This indicates that the word or words in the array can be said (in sequence, in the case of multiple words), one time or more, but can also be omitted as a whole from the utterance.
OneOfTheseWillBeSaidOnce // This indicates that exactly one selection from the words in the array must be said one time.
OneOfTheseCanBeSaidOnce // This indicates that exactly one selection from the words in the array can be said one time, but that all of the words can also be omitted from the utterance.
OneOfTheseWillBeSaidWithOptionalRepetitions // This indicates that exactly one selection from the words in the array must be said, one time or more.
OneOfTheseCanBeSaidWithOptionalRepetitions // This indicates that exactly one selection from the words in the array can be said, one time or more, but that all of the words can also be omitted from the utterance.

Since an NSString in these NSArrays can also be a phrase, references to words above should also be understood to apply to complete phrases when they are contained in a single NSString.

A key-value pair can also have NSDictionaries in the NSArray instead of NSStrings, or a mix of NSStrings and NSDictionaries, meaning that you can nest rules in other rules.

Here is an example of a complex rule which can be submitted to the generateGrammarFromDictionary: argument followed by an explanation of what it means:

Breaking it down step by step to explain exactly what the contents mean:

@{
ThisWillBeSaidOnce : @[ // This means that a valid utterance for this ruleset will obey all of the following rules in sequence in a single complete utterance:
@{ OneOfTheseCanBeSaidOnce : @[@"HELLO COMPUTER", @"GREETINGS ROBOT"]}, // At the beginning of the utterance there is an optional statement. The optional statement can be either "HELLO COMPUTER" or "GREETINGS ROBOT" or it can be omitted.
@{ OneOfTheseWillBeSaidOnce : @[@"DO THE FOLLOWING", @"INSTRUCTION"]}, // Next, an utterance will have exactly one of the following required statements: "DO THE FOLLOWING" or "INSTRUCTION".
@{ OneOfTheseWillBeSaidOnce : @[@"GO", @"MOVE"]}, // Next, an utterance will have exactly one of the following required statements: "GO" or "MOVE"
@{ThisWillBeSaidWithOptionalRepetitions : @[ // Next, an utterance will have a minimum of one statement of the following nested instructions, but can also accept multiple valid versions of the nested instructions:
@{ OneOfTheseWillBeSaidOnce : @[@"10", @"20",@"30"]}, // Exactly one utterance of either the number "10", "20" or "30",
@{ OneOfTheseWillBeSaidOnce : @[@"LEFT", @"RIGHT", @"FORWARD"]} // Followed by exactly one utterance of either the word "LEFT", "RIGHT", or "FORWARD".
]},
@{ OneOfTheseWillBeSaidOnce : @[@"EXECUTE", @"DO IT"]}, // Next, an utterance must contain either the word "EXECUTE" or the phrase "DO IT",
@{ ThisCanBeSaidOnce : @[@"THANK YOU"]} and there can be an optional single statement of the phrase "THANK YOU" at the end.
]
};

So as examples, here are some sentences that this ruleset will report as hypotheses from user utterances:

"HELLO COMPUTER DO THE FOLLOWING GO 20 LEFT 30 RIGHT 10 FORWARD EXECUTE THANK YOU"
"GREETINGS ROBOT DO THE FOLLOWING MOVE 10 FORWARD DO IT"
"INSTRUCTION 20 LEFT 20 LEFT 20 LEFT 20 LEFT EXECUTE"

But it will not report hypotheses for sentences such as the following which are not allowed by the rules:

"HELLO COMPUTER HELLO COMPUTER"
"MOVE 10"
"GO RIGHT"

Since you as the developer are the designer of the ruleset, you can extract the behavioral triggers from your app from hypotheses which observe your rules.

The words and phrases in languageModelArray must be written with capital letters exclusively, for instance "word" must appear in the array as "WORD".

The last two arguments of the method work identically to the equivalent language model method. The withFilesNamed: argument takes an NSString which is the naming you would like for the files output by this method. The argument acousticModelPath takes the path to the relevant acoustic model.

This method returns an NSError, which will either return an error code or it will return noErr with an attached userInfo NSDictionary containing the paths to your newly-generated grammar (a .gram file) and corresponding phonetic dictionary (a .dic file). Remember that when you are passing .gram files to the Pocketsphinx method:

Generate a language model from a text file containing words and phrases you want OEPocketsphinxController to understand, using your chosen acoustic model. The file should be formatted with every word or contiguous phrase on its own line with a line break afterwards. Putting a phrase in on its own line makes it somewhat more probable that the phrase will be recognized as a phrase when spoken.

Give the correct full path to the text file as a string. fileName is the way you want the output files to be named, for instance if you enter "MyDynamicLanguageModel" you will receive files output to your Caches directory titled MyDynamicLanguageModel.dic, MyDynamicLanguageModel.arpa, and MyDynamicLanguageModel.DMP.

If this method is successful it will return nil. If it returns nil, you can use the methods pathToSuccessfullyGeneratedDictionaryWithRequestedName: and pathToSuccessfullyGeneratedLanguageModelWithRequestedName: to get your paths to your newly-generated language models and grammars and dictionaries for use with OEPocketsphinxController. If it doesn't return nil, it will return an error which you can check for debugging purposes.

Property Documentation

- (BOOL) verboseLanguageModelGenerator

readwritenonatomicassign

Set this to TRUE to get verbose output

- (BOOL) useFallbackMethod

readwritenonatomicassign

Advanced: if you are using your own acoustic model or an custom dictionary contained within an acoustic model and these don't use the same phonemes as the English or Spanish acoustic models, you will need to set useFallbackMethod to FALSE so that no attempt is made to use the English or Spanish fallback method for finding pronunciations of words which don't appear in the custom acoustic model's phonetic dictionary.

OELogging Class Reference

Detailed Description

A singleton which turns logging on or off for the entire framework. The type of logging is related to overall framework functionality such as the audio session and timing operations. Please turn OELogging on for any issue you encounter. It will probably show the problem, but if not you can show the log on the forum and get help.

OEPocketsphinxController Class Reference

Detailed Description

The class that controls local speech recognition in OpenEars.

Usage examples

What to add to your header:

To use OEPocketsphinxController, the class which performs speech recognition, you need a language model and a phonetic dictionary for it. These files define which words OEPocketsphinxController is capable of recognizing. We just created them above by using OELanguageModelGenerator. You also need an acoustic model. OpenEars ships with an English and a Spanish acoustic model.

First, add the following to your implementation (the .m file): Under the @implementation keyword at the top:

Start the speech recognition engine up. You provide the full paths to a language model and a dictionary file which are created using OELanguageModelGenerator and the acoustic model you want to use (for instance [OEAcousticModel pathToModel:"AcousticModelEnglish"]).

- (NSError *) stopListening

Shut down the engine. You must do this before releasing a parent view controller that contains OEPocketsphinxController.

- (void) suspendRecognition

Keep the engine going but stop listening to speech until resumeRecognition is called. Takes effect instantly.

- (void) resumeRecognition

Resume listening for speech after suspendRecognition has been called.

- (void) changeLanguageModelToFile:

(NSString *)

languageModelPathAsString

withDictionary:

(NSString *)

dictionaryPathAsString

Change from one language model to another. This lets you change which words you are listening for depending on the context in your app. If you have already started the recognition loop and you want to switch to a different language model, you can use this and the model will be changed at the earliest opportunity. Will not have any effect unless recognition is already in progress. It isn't possible to change acoustic models in the middle of an already-started listening loop, just language model and dictionary.

- (void) runRecognitionOnWavFileAtPath:

(NSString *)

wavPath

usingLanguageModelAtPath:

(NSString *)

languageModelPath

dictionaryAtPath:

(NSString *)

dictionaryPath

acousticModelAtPath:

(NSString *)

acousticModelPath

languageModelIsJSGF:

(BOOL)

languageModelIsJSGF

You can use this to run recognition on an already-recorded WAV file for testing. The WAV file has to be 16-bit and 16000 samples per second.

- (void) requestMicPermission

You can use this to request mic permission in advance of running speech recognition.

Property Documentation

Gives the volume of the incoming speech. This is a UI hook. You can't read it on the main thread or it will block.

- (BOOL) micPermissionIsGranted

readatomicassign

Returns whether your app has record permission. This is expected to be used after the user has at some point been prompted with requestMicPermission and the result has come back in the permission results OEEventsObserver delegate methods. If this is used before that point, accuracy of results are not guaranteed. If the user has either granted or denied permission in the past, this will return a boolean indicating the permission state.

- (float) secondsOfSilenceToDetect

readwritenonatomicassign

This is how long OEPocketsphinxController should wait after speech ends to attempt to recognize speech. This defaults to .7 seconds.

- (BOOL) returnNbest

readwritenonatomicassign

Advanced: set this to TRUE to receive n-best results.

- (int) nBestNumber

readwritenonatomicassign

Advanced: the number of n-best results to return. This is a maximum number to return – if there are null hypotheses fewer than this number will be returned.

- (BOOL) verbosePocketSphinx

readwritenonatomicassign

Turn on extended logging for speech recognition processes. In order to get assistance with a speech recognition issue in the forums, it is necessary to turn this on and show the output.

- (BOOL) returnNullHypotheses

readwritenonatomicassign

By default, OEPocketsphinxController won't return a hypothesis if for some reason the hypothesis is null (this can happen if the perceived sound was just noise). If you need even empty hypotheses to be returned, you can set this to TRUE before starting OEPocketsphinxController.

- (BOOL) isSuspended

readwritenonatomicassign

Check if the listening loop is suspended

- (BOOL) isListening

readwritenonatomicassign

Check if the listening loop is in progress

- (BOOL) removingNoise

readwritenonatomicassign

Try not to decode probable noise as speech (this can result in more noise robustness, but it can also result in omitted segments – defaults to YES, override to set to NO)

- (BOOL) removingSilence

readwritenonatomicassign

Try not to decode probable silence as speech (this can result in more accuracy, but it can also result in omitted segments – defaults to YES, override to set to NO)

- (float) vadThreshold

readwritenonatomicassign

Speech/Silence threshhold setting. You may not need to make any changes to this, however, if you are experiencing quiet background noises triggering speech recognition, you can raise this to a value from 2-5 to 3.5 for the English acoustic model, and between 3.0 and 4.5 for the Spanish acoustic model. If you are experiencing too many words being ignored you can reduce this. The maximum value is 5.0 and the minimum is .5. For the English model, values less than 1.5 or more than 3.5 are likely to lead to poor results. For the Spanish model, higher values can be used. Please test any changes here carefully to see what effect they have on your user experience.

- (BOOL) disableBluetooth

readwritenonatomicassign

Optionally disable bluetooth support for a listening session in case you never want bluetooth to be an audio route. Only set TRUE if you are sure you want this; defaults to FALSE (meaning that the default audio session supports bluetooth as a route unless you use this to declare otherwise).

- (BOOL) disableMixing

readwritenonatomicassign

Optionally disable audio session mixing. Only set TRUE if you are sure you want this; defaults to FALSE (meaning that the default audio session mode is with mixing enabled unless you use this to declare otherwise).

- (NSString*) audioMode

readwritenonatomiccopy

Set audio modes for the audio session manager to use. This can be set to the following:

@"Default" to use AVAudioSessionModeDefault
@"VoiceChat" to use AVAudioSessionModeVoiceChat
@"VideoRecording" AVAudioSessionModeVideoRecording
@"Measurement" AVAudioSessionModeMeasurement

If you don't set it to anything, "Default" will automatically be used.

- (NSString*) pathToTestFile

readwritenonatomiccopy

By setting pathToTestFile to point to a recorded audio file you can run the main Pocketsphinx listening loop (not runRecognitionOnWavFileAtPath but the main loop invoked by using startListeningWithLanguageModelAtPath:) over a pre-recorded audio file instead of using it with live input.

In contrast with using the method runRecognitionOnWavFileAtPath to receive a single recognition from a file, with this approach the audio file will have its buffers injected directly into the audio driver circular buffer for maximum fidelity to the goal of testing the entire codebase that is in use when doing a live recognition, including the whole driver and the listening loop including all of its features. This is for creating tests for yourself and for sharing automatically replicable issue reports with Politepix.

To use this, make an audio recording on the same device (i.e., if you are testing OEPocketsphinxController on an iPhone 5 with the internal microphone, make a recording on an iPhone 5 with the internal microphone, for instance using Apple's Voice Memos app) and then convert the resulting file to a 16-bit, 16000 sample rate, mono WAV file. You can do this with the output of Apple's Voice Memos app by taking the .m4a file that Voice Memos outputs and run it through this command in Terminal.app:

Then add the WAV file to your app, and right before sending the call to startListeningWithLanguageModelAtPath, set this property pathToTestFile to the path to your audio file in your app as an NSString (e.g. [[NSBundle mainBundle] pathForResource:"Memo" ofType:@"wav"]).

Note: when you record the audio file you will be using to test with, give it a second of quiet lead-in before speech so there is time for the engine to fully start before listening begins. If you have any difficulty getting this to work, remember to turn on OELogging to get error output, which will probably explain what is not working.

SmartCMN is disabled during testing so that the test gets the same results when run for different people and for different devices. Please keep in mind that there are some settings in Pocketsphinx which may prevent a deterministic outcome from a recognition, meaning that you should expect a similar score over multiple runs of a test but you may not always see the identical score. There are examples of asynchronous testing using this tool in this project in the test target.

- (BOOL) useSmartCMNWithTestFiles

readwritenonatomicassign

If you are doing testing, you can toggle SmartCMN on or off (it defaults to off and should usually be left off since using it can lead to nondeterministic results on the first runs with new devices).

<OEEventsObserverDelegate> Protocol Reference

abstract

Detailed Description

OEEventsObserver provides a large set of delegate methods that allow you to receive information about the events in OpenEars from anywhere in your app. You can create as many OEEventsObservers as you need and receive information using them simultaneously.

Usage examples

What to add to your header:

OEEventsObserver is the class which keeps you continuously updated about the status of your listening session, among other things, via delegate callbacks.
Add the following lines to your header (the .h file). Under the imports at the very top:

#import <OpenEars/OEEventsObserver.h>

at the @interface declaration, add the OEEventsObserverDelegate inheritance.
An example of this for a view controller called ViewController would look like this:

Add the following to your implementation (the .m file):
Before you call a method of either OEFliteController or OEPocketsphinxController (perhaps in viewDidLoad), instantiate OEEventsObserver and set its delegate as follows:

NeatSpeech is a plugin for OpenEars™ that lets it do fast, high-quality offline speech synthesis which is compatible with iOS6.1, and even lets you edit the pronunciations of words! Try out the NeatSpeech demo free of charge.

AllHours® is a registered trademark of PolitepixThe Politepix site uses cookies in order to understand how the website is used by visitors and in order to enable some required functionality. You can learn all about which cookies we use on the About page, as well as everything about our privacy policy.TWITTER | CONTACT POLITEPIX | IMPRESSUM | ABOUT | LEGAL | IMPRINT