Abstract

The W3C Voice Browser working group aims to develop
specifications to enable access to the Web using spoken
interaction. This document is part of a set of requirements
studies for voice browsers, and provides details of the
requirements for natural language processing.

Status of this document

This document describes the requirements for natural language
processing for voice browsers, as a precursor to starting work on
specifications. Related requirement drafts are linked from the introduction. The
requirements are being released as working drafts but are not
intended to become proposed recommendations.

This specification is a Working Draft of the Voice Browser working
group for review by W3C members and other interested parties. This is
the first public version of this document. It is a draft document and
may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use W3C Working Drafts as reference
material or to cite them as other than "work in progress".

Publication as a Working Draft does not imply endorsement by
the W3C membership, nor of members of the Voice Browser working
groups. This is still a draft document and may be updated,
replaced or obsoleted by other documents at any time. It is
inappropriate to cite W3C Working Drafts as other than "work in
progress."

Define specifications for natural language processing
components, based on the feedback received.

0.1 Scope

This document specifies requirements that define the
capabilities of any component of a voice browser system which
performs natural language interpretation, that is, the task of
determining and representing the content of a natural language
input from a user. Interpretation components include both
stand-alone natural language understanding (NLU) components which
receive text string results from a speech recognizer or keyboard
as well as speech recognizers that incorporate natural language
understanding functionality by returning interpretations
rather than, or in addition to, text strings.

0.2 Interaction with Other Groups

The activities of the Natural Language Requirements
Subgroup will be coordinated with the activities of the Grammar
Representation Subgroup, the Synthesis Markup Subgroup, and the
Dialog Subgroup.

General Requirements

The NLU system should be able to:

Return a message stating that it cannot interpret an input at
all. (must specify)

Return partial information if it is unable to completely
process an input. (must specify)

If partial information is returned, indicate how much of the
input was left unanalyzed. (nice to specify)

Be extensible in the sense that it should be possible to add
new types of utterances to the NLU specification. Specifically,
the system should be able to incorporate modular subdialogs.
(must specify)

Return a score reflecting its confidence in the overall
interpretation. The exact format of confidence scores remains to
be determined. It could be a rough scale or it could be
probabilities, for example. (should specify)

Return a score for each attribute, reflecting its confidence
in the interpretation of that attribute. (should specify)

Return multiple analyses (n-best) (nice to specify)

Input Requirements

A standalone (i.e., not integrated with a speech recognizer)
NLU system should be able to:

Accept N-best ASR output with or without acoustic scores for
either the whole utterance, each word in an utterance or both.
(must specify)

Accept coordinated simultaneous multi-modal input. For
example, the NLU system should be able to represent or
interpret a representation of the context so that
anaphoric expressions in the user's utterances which refer to
items in the context can be interpreted. The context can include
the speech context, including the system's utterances, as well as
the external context (e.g. I'll take one of these
(click)). (nice to specify)

Task-specific information

These requirements are intended to insure that the natural
language component is capable of representing results of
processing task-specific utterances.

An NLU system should be able to:

Represent task information:

Represent values for slots in a task model: I want five
lines. (must specify)

Support hierarchical attributes in task model: e.g. a
slot can itself be a frame. (must specify)

Represent interpretations of sentences with anaphora and
ellipsis. I want two hamburgers, one with ketchup and one
without. (must specify)

Represent deictic utterances, which require reference to the
non-linguistic context for their interpretation. I want this.
(must specify).

Represent meta-task information (all nice to specify)

Represent a request for a definition: What does 'access
code' mean?

Represent a request for the status of a filled slot: How
many lines did I ask for?

Represent information about a slot: How many lines am I
allowed to ask for?

Represent a request for the possible fillers of a slot:
What are my choices? Can I schedule a call on Sunday?

Represent a request for the status of all slots. What
have I ordered so far?

Represent questions about possible, desirable, necessary and
conditional situations. Can you pay my electric bill? Should
I order the chicken? Do I have to get a drink with the special?
If I stay over Saturday night will I get a lower fare?

Represent requests for explanation of a system response.
Why?

Represent request for the amount of task remaining. What
else do you need to know? Am I almost finished?

Generic Information about the Communication Process

An NLU system should be able to represent meta-dialog
information having to do with the communication process.(all nice
to specify except as noted)

Represent utterances about the dialog: I want to revisit
my previous answer. That's what I just said.

User isn't actually asking to have the bill paid, just whether
the system can do it.

Capability_query:

Action: transfer

Destination_account: electric bill

24. System: Yes, I can transfer money to General Utilities

25: User: OK, pay it

Action: transfer

Destination_account: electric bill

26: System, How much money should I transfer to General
Utilities?

System solicits a value for the amount parameter from the
user

27: User: transfer $80 from my joint checking account

User responds with the value for the requested parameter, and
volunteers a parameter for another parameter, the source.

Operation: transfer

Source_account: joint checking

Amount:

Value: 80

Currency: USD

28: System: You want to transfer $80 from your joint checking
account to General Utilities?

System asks user to confirm entire transaction

29: User: Make it so

The user watches Captain Picard on Star Trek and copies his
command phrasing.

Yes

30: System: Do you want me to do anything else?

System prompts user for another transaction

31: User: Yes, transfer $800 from the IRS to my savings
account

User tries to do the impossible.

Operation: transfer

Source_account: IRS

Destination_account: joint checking
account

32: System: I am not allowed to transfer money from the IRS
account

system enforces business rules about source accounts.

33: User: What can I do to the IRS?

While the user may be joking, the system treats this as a help
request

Capability_query: action

actor: user

theme: IRS

If the request is not representable within the task
model, it may be useful for the system to fall back to a less
fully analyzed representation of the utterance, rather than
simply to fail to understand it.

34: System: you may pay your taxes by transferring money to
the IRS

System explains what operations the user can perform with the
IRS account