Meeting with TAG for lunch on Thursday of F2F. Topic will be HST. Who
would like to attend?
- Raman
- Scott
- Jim Barnett
- Brad
- Paolo
- Bodell
- Dave Raggett
- Claus
- Tim
- Brian
- Alex Lee
- RJ Auburn
Jim: Propose we arrange lunch in the room
TV: How much do people know about this meeting?
Jim: Right now, nothing. Hoping Jim Barnett will provide an agenda.
TV: Is that appropriate for this meeting or do we want to start at a
higher level.
Jim: would like Jim to put together a 10 minute presentation.
JimB: I'll just give the high-level view at the white-board.
TV: One of the ways to show the value is to demonstrate we're slicing
out a commonality and show we're putting a formalism to it.
Scott: At the face to face we want to cover all the topics on the
deferred list. May wish to change the timing in the agenda.
Jim: The purpose of the agenda is just to let people know what the
topics will be.
Scott: Do we want a planning session in the VoiceXML section or do it
all in the planning day?
Jim: I don't have a preference.
Scott: I propose we take the last hour of that day to do the V3 planning
for what concrete proposals we need.
Scott: Today we have a number of CRS to discuss: audio controls,
grammar issues, recording CRs. Still have a few telephony ones to
cover, but RJ can't make it today. We'll finalize the discussion on
those at the F2F. Starting with the first... audio controls. In Jim's
email there is a link to the email I sent listing the audio control
CRs. There are about 6 CRs.
CR 64 - Ability to indicate the beginning and end of the audio
resource. Specify a start time and a stop time. Similar requirements
in SMIL and the daisy framework.
CR 154 - Asking or sticky volume and speed for TTS. Specify default TTS
value for speed and volume to remain enforced for the document or the
application. May be reasonable for a session-level feature. Need to
figure out how it interacts with SSML.
Dan: every time we discuss this we ask if it should be applied to all
audio or to the TTS.
CR ??? - Jump forward/back.
JimB: Does this require fully asynchronous events?
Mike: If you have the <mark/> tag you can jump back and forth.
Dave: Challenges are what are the features and then how do they bind to
the UI?
CR 605 - General request for providing more client-side audio control.
Where is the offset of the mark? Scope of what is being controlled may
need control beyond the audio file. Points out some browsers have
extensions already.
CR 608 - Control the speed for audio through client-side controls.
CR 613 - Specific proposal for pause/resume forward/backward volume...
has a model in the proposal.
Scott: Proposal -- accept those as the basis for requirements. Since we
have a specific proposal; take that as a straw-man and begin to work out
how we would want that to work.
Brad: Do any of the CRs cover audio layering? Mixing audio in parallel.
Scott: No, hasn't made it into a CR.
Dan: SMIL can do that right?
Scott: Yes
Dan: Just mention that as I don't know if we've really evaluated how
SMIL might work on this.
Scott: Should we solicit proposals for the CRs?
Brad: I think so.
Jim: Are we soliciting from the WG or the public?
Scott: WG
Paolo: (ACTION) I will work on submitting these CRs.
Scott: Thanks Paolo
Scott: There are some issues that arise in this area. What is the scope
of your seek, audio, prompts, entire queue? Most proprietary
implementations seem to be at the prompt level. Most app developers
seem to be looking at the prompt queue.
Emily: Thinking about the prompt queue aligns well with what we've done
with <mark/> in 2.1.
Scott: The other issue is how you do the media control itself. There
are generally three -- synchronous event model (stop the prompt queue,
apply an action); second model is to move toward a more asynchronous
event model (events could be passed up without stopping and manipulate
audio directly); third model is to use VoiceXML to define mappings
between key bindings and audio and pass to runtime processor,
lower-level subsystem manages all interactions until some unhandled
interaction happens.
JimB: The third, runtime controls, might be far more efficient.
Scott: Should make authoring much simpler. Don't need to manage queue.
Mike: I think async events is a bad idea. 1 or 3 seems much better. 3
would be hard because SSML interpreters must become asynchronous.
Emily: Decision depends on the actual behavior. Pause/resume may have
better latency Which you choose depends on the function?
Mike: Even with pause/resume pushing it up shouldn't be latent.
Brad: Do we have enough use cases to help evaluate these?
Scott: yes, that was my thought on the next steps.
JimA(?): Voicemail has many of the use cases.
Scott: My proposal is we activate audio control as a CR area for
VoiceXML 3.0 and begin to flesh out requirements and use cases for this
area in greater detail.
(General consent)
Grammar CRs
Paolo: You can review my email where we go CR by CR.
CR 6 - More fine-grained control. Want mixed-initiative to fill a
form. Separate function is to do navigational <goto/>. Suggesting a
separation between the two.
Paolo: I can see the problem because there is no way to turn on or off
the grammars. Question is do we have use cases?
Dan: One notion that has come up in the past was around finer
selectivity of grammars in general. May want to make this an instance
of that.
Mike: May want a cond= attribute on grammar.
Dan: Would you envision this on every grammar?
Mike: I would envision it on every grammar.
Brad: There is a bigger issue about separation between navigation and
information collection being overloaded in a certain form.
Scott: Based on this discussion I would propose we activate this. We
can look at cond= and some of the higher-level navigation issues.
Paolo: Conclusion is to activate and discuss.
CR 13 - utterance vs. semantic slot confidence
Paolo: couldn't find any language that covered this in VoiceXML 2.0
Scott: I was sure we put language in this.
Paolo: I don't see the confidence on the interpretation. EMMA can
annotate everything, but in the current VoiceXML 2.0 description I can't
see how you do it. I think this would be interesting to consider in
VoiceXML 3.0 to allow better searching of results and mixed-initiative.
Mike: My memory is similar to Scott's.
Scott: Still looking for it. Dan, didn't you work on this?
Dan: Yes, looks like my CR; trying to understand it. I believe the spec
was CR was processed to meet the basic requirements; would suggest
looking at this CR in the broader context of refactoring processing of
reco results.
Scott: You would propose waiting for more specific requirements?
Dan: Yes, we would look at this if we were reopening confidence and
other capabilities.
Paolo: Aligning with EMMA might be the right thing to take on in V3.
Addressing EMMA bindings will address this CR.
(general discussion around EMMA bindings)
Scott: CR 119 covers that right?
Dan: Propose closing this; may reopen if some request for more results
information is made.
CR 15 - Multiple semantic interpretation results
Paolo: We have nbest; but not mulitple semantic interpretations per result.
Scott: We had this by having multiple nbest requests with the same
utterance string with different semantic interpretations.
Dan: From my perspective this addresses it.
Paolo: If no one needs this intermediate level, we do not need to reopen
this.
Dan: Process issue is that there is no Nuance rep; should we email?
Scott: As a process we send change to status notifications to the
representative of that CR.
Dan: I can give you a name for a Nuance rep.
CR 110 -- expr attribute on <grammar/>
Paolo: Addressed by 2.1
CR 111 -- Support a <value/> inside and inline grammar.
Paolo: Don't believe this is addressed.
Mike: <value/> in <grammar/> would be supportable as a preprocess
similar to SSML
Mike: Has issues in caching of grammars.
Dan: i think this one is worth discussing; not appropriate to reject
outright. could defer for SRGS discussion; should be done in VoiceXML
arena.
Scott: we need to decide if this is useful for app development.
Scott: Propose we continue these discussions at the f2f.