Java Speech Development Kit: A Tutorial

The authors show how to get started developing voice-activated interfaces using the Speech for Java Development Kit.

Results

As mentioned, the result objects are created as answers to
the recognition performed by the SDK Java. They are responsible for
generating recognition result events or ResultEvents that might be
intercepted by ResultListeners. These events are implemented in
such a way that it is possible for the application to have access
to the object that generated it (following the Java standard)
through the getSource method. (The getSource method belongs to the
java.util.EventObject class interface of which ResultEvent
inherits.) In this context we will have a result object, or a
descendent, returned.

The possible result object states are:

FINALIZED:

ACCEPTED: the audio item was understood and an
association with one of the active grammars was determined.

REJECTED: the audio item was understood, but the
recognizer considers a high possibility of a mistake having been
made. That means, the recognizer was able to understand (to
associate a string, or token, meaning the heard sound) what was
said, but there was not enough information to be sure of the
recognition, due to poor sound quality, a bad pronunciation or even
due to hardware problems. These results must be treated carefully
by the application.

UNFINALIZED: the audio item was understood and it
is been processed, but it was not possible yet to determine an
association with one of the active grammars.

Accept or Reject: rejection of a result indicates that the
recognizer is not confident that it has accurately recognized what
a user said. Rejection can be controlled through the
RecognizerProperties interface with the setConfidenceLevel method.
Increasing the confidence level requires the recognizer to have
greater confidence to accept a result, so more results are likely
to be rejected.

Below is an illustration of the recognition cycle and some of
the fired events:

By looking at the figure, we can establish the relationship
between the result object states and the events (ResultEvent) that
the listeners (ResultListeners) are able to intercept.

A RESULT_CREATED event creates a result object. A
new result is started in the UNFINALIZED state.

UNFINALIZED state: RESULT_UPDATED events indicate a
change in finalized and/or unfinalized tokens; a GRAMMAR_FINALIZED
event indicates that the grammar matched by this result has been
identified.

The RESULT_ACCEPTED event finalizes a result by
indicating a change in state from UNFINALIZED to ACCEPTED.

The RESULT_REJECTED event finalizes a result by
indicating a change in state from UNFINALIZED to REJECTED.

In the finalized states (ACCEPTED and REJECTED),
the AUDIO_RELEASED and TRAINING_INFO_RELEASED events also may be
issued.

The result objects are:

Result: the most primitive form of a created
result. This form is used until the recognition cycle is finished,
before all the information relating to a certain audio entrance is
available. The substates might be FINALIZED or UNFINALIZED.

FinalResult: created when we have all the
information relating to a certain audio entrance, that is, it is a
complete result able to supply all the possible data as a
consequence of a complete recognition. Its substates are ACCEPTED
or REJECTED (always FINALIZED).

The information available in a result is determined by the
type of grammar to which it was associated. Therefore, completing
the model, the FinalResult interface is inherited by two other
interfaces exclusively implemented by the recognizer for a certain
finalized result, they are:

FinalRuleResult: the final result in consequence of
an audio entrance associated to a RuleGrammar, with the substates
ACCEPTED or REJECTED (FINALIZED).

FinalDictationResult: the final result in
consequence of an audio entrance associated to a DictationGrammar,
with the substates ACCEPTED or REJECTED (FINALIZED).

This schema is useful on the following situations:

Makes it possible to have access to the created
results before they are finalized, for that we use simple result
objects.

We can have both types of grammars associated to
the same recognizer. And, in certain moments, we might need
information that is not relevant either to an association with a
rule or to a dictation grammar, but simply for a finalized result.
For that we can use the FinalResult interface without having to
test the nature of the result.

Having two types of finalized results, we can have
additional data referring to the type of the grammar associated,
increasing the control of applications.

i have read your tutorial about the Java Speech Development Kit, it is truely very interesting, i would realy love to develop a programe of my own on this context. please help me im a graduate from the University of Botswana in Computer Science in Botswana, i realy ineterested in speech program but i dont know where to start and what i need. more especialy that i dont have any of the classies that i can use for statup training plz give me an advice.

I have done a lot of research on the speech development architecture of java..but i am more interested in some package that might make the development of a speech to text application faster. Thanx...hope i get a reply...my emaill is mario_ramotar@yahoo.com

Trending Topics

Upcoming Webinar

Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report

August 27, 2015
12:00 PM CDT

DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.