Slides

Testing / Evaluation
Eric Morley
June 1, 2010
Papers
• File, P., Todman, J., 2002. Evaluation of the coherence of
computer-aided conversations. Augmentative and
Alternative Communication 18 (December), 228-241.
• Todman, John and Halina Rzepecka. 2003. Effect of preutterance pause length on perceptions of communicative
competence in AAC-aided social conversations.
Augmentative and Alternative Communication, 19(4):222–
234.
• Ball, L. L., Beukelman, D. R., Pattee, G. L., 2004. Acceptance
of augmentative and alternative communication
technology by persons with amyotrophic lateral sclerosis.
Augmentative and Alternative Communication 20 (2), 113122.
File and Todman 2002
• Voice output communication aids (VOCAs) allow
storage of whole utterances, but are not designed
for users to rely on these for open-ended
conversations
– Impossible to predict which utterances will be needed
– Difficult to locate potential follow-up phrases
• May be possible to use “imperfect expressions” in
social conversation and still have a positive
experience
TALK
• Computer system (Talk Aid Using Pre-Loaded Knowledge)
• Uses pre-stored messages
• For people who understand language but can’t talk
(anymore)
• User writes phrases on their own time
– Phrases stored in person, time and aspect locations
• Ex: My father was a shop manager (me/past/who)
• Quick fires: “ah yes”, “too bad”, etc.
– Can be set to say equivalent phrase to one selected
• Comments: useful in many contexts (ex. “what about
you?”)
• Allows word by word entry
Old TALK
• Evaluations have been getting more realistic since
inception (started with cocktail party type
situation)
• Rates of up to 40-60wpm (vs 2-15wpm for word
by word)
– Improvements in interaction quality
• Some differences from normal conversation in
simulated use with unimpaired user
– VOCA user was less likely to follow up with narrative
Current Study
• Conversations can be on any topic
• Genuine users interacting with new partners
• 19/68 conversations had the same partner allowed assessment of coherence
of conversations with a familiar partner
• Analyzed 68 old transcripts of conversations with lag-sequential
analysis
• Participants were
– 1 TALK user (~40y.o w/cerebral palsy w/dysarthria)
– 56 undergrad psych students
• 50 students were invited to talk
• 6 were invited to have a “getting to know you” conversation
• People: TALK user (CP); repeat partner (RP); new partners (NP1 and
NP2) from different sets of partners (not repeated)
Hand Labeling
• Speech acts coded by category
– Questions; answer; observation; agreement;
disagreement; repetition; interjection; directive;
narrative
– 2,345 speech acts w/RP; 2,802 acts w/NP1; 4,268
w/NP2
• Seemed to be ~90% agreement on labeling
based off of re-labeling 15% of the acts
– Used kappa statistic to confirm agreement of
labelers
Lag Sequential Analysis
• Statistical analysis of a sequence of terms (here speech acts
in a conversation)
– Finds statistically significant pairs of acts which occur at
particular distances (ie lags)
– Lag n means acts a and b with n-1 intervening acts
• Crosslag: speech act a (intervening acts) speech act b
• Autolag: speech act a (intervening acts) speech act a
• Looked at both types together and at crosslag alone
– Pooled data across conversation sets (RP, NPi)
• Use z-scores: difference between observed and expected
lag probabilities
• Only counted long sequences as statistically significant if
subsequences were also significant)
Results
• Most speech acts were questions, answers, observations or
agreements
• VOCA user made 42% of acts (41% of questions and
observations, 55% of answers, 10% of agreements)
– In unaided conversations one would expect many more
observations than questions and answers, and some more
agreements
• 31 sequences identified
– 19 had ≥3 speech acts
– Of those which occurred in all 3 sets, 13 sequences identified, 7
with 3 speech acts
• Question and answer sequences were common
– Facilitates turn-taking
Initiations
• Observation
– Often followed by questions, sometimes agreement
• CP doesn’t interject as much as RP
• Agreement
– Used for turn taking in unaided conversation
– Questions used for this in aided conversation
• Other
– CP repetition and N/RP narrative followed by question
Discussion
• Only speakers reliably followed answers with observations or
narrative
• Aided partner did not use narrative (possibly because of lack of
practice)
– Maybe training would help this
• More questions in aided conversation
– More likely that the VOCA user has an appropriate general question
than a specific narrative or observation
– Gives VOCA user control over topic
• Pre-stored utterances seem to be re-usable between new and
repeat conversation partners
• Should include “quick fire” interjection support for conversation
with RP
– Maybe the space taken up by this can be taken up with something else
for NP mode?
Todman and Rzepecka 2003
• Several types of VOCAs
– Word by word
• Need to generate text at some point, even with pre-stored
messages. Pauses are so long that speaking rate goes to 2-15wpm
• Utterances are extremely short, partner dominates conversation,
“folk walk away”
– Whole utterance approach (WUA)
• Based on the idea that content of conversations is “frequently
approximate rather than precise”
• Should result in faster communication rates when precision is not
critical
– Is this the case?
– Does quality of conversation degrade?
– If yes to both of these, does this result in more positive perceptions of
VOCA users’ communicative competence and interactions with them?
More on WUA
• TALK system
– For free-ranging social interactions
– Uses lots of small talk
– 40wpm w/o training, 50-80 with
• Frametalker
– Designed for transactional conversations
(restaurant, bank, etc)
– 45 wpm, rated as having a high degree of
naturalness
Quality of Conversation (WUA)
• Todman, Elder and Alm (1995)
– Speaking person used TALK to converse
– Parts of these conversations were reenacted with
speakers, and pauses were removed
– Compared with non-aided conversations
– Aided found to be of higher quality
• Likely because pre-composed messages will be more
coherent
– Would this be the case to the other listener, or only for people
eavesdropping?
Conversation Rate and Perceived
Competence
• Variation in conversation rate (CR) can be
approximated by looking at pre-utterance pause length
• A pause of even a few seconds can cause problems
–
–
–
–
User perceived as unintelligent
Poor quality of social interactions
Frustration
Abandonment of VOCA
• Previous studies have found
– Positive correlation between conversational rate and
satisfaction
– Negative correlation between pre-utterance pause length
and satisfaction
Previous Experiments
•
•
•
Newman (1982)
– Pre-utterance pauses of 4-7s led to worse interactions when compared with utterances w/o
pauses
– If the pauses were a result of doing something else (sculpting), this effect disappeared
• Does this apply to VOCA users?
Ratcliff, et al (2002)
– Effects of pauses and speaking rate on naturalness of synthetic speech
– Increased speaking rate perceived as more natural, pauses didn’t seem to do much on their
own (only had an effect since they changed the speaking rate)
Bedrosian (2002)
– VOCA users ask for a book, 1 with a mostly irrelevant message (after 4), other after 90sec with
a highly relevant message
• 2nd approach preferred
– Had VOCA users give too much/too little information quickly, or relevant info after a delay
• Tradeoff: short delay led to improved affective/behavioral ratings, wrong amount of info
led to lower rating of cognitive component
Current Experiment
•
•
•
How does pre-utterance pause time affect the perception of social conversation?
Does the amount of experience a VOCA user has with WUA have an effect on this?
3 VOCA users with cerebral palsy
– Used TalkBoards, had varying experience with this
•
Partners had 20 min introductory conversation
– 2 of 3 got sick, so 5 partners, none with VOCA experience
– Possible effects of having different partners
– Told there were no restrictions on topic of conversation, but other would use VOCA
•
Interactions recorded, 5 min chunk extracted (after small talk)
– Pre-utterance pauses replaced with pauses of set lengths (2-10s)
•
Pauses didn’t seem to be identical in length, so those are means
– Also used original interaction
•
28 raters
–
–
–
–
All psych students
Used Likert scale and recorded conversations as baseline
told they would first listen to a conversation with some natural speech changed to synthetic
Then they would listen to “getting to know you” conversation involving VOCA user
Current Experiment (cont’d)
• 7 point scale to evaluate 4 areas
– Linguistic, operational, social, strategic
• Raters heard each conversation 1x with one
pause variation for each one
• Blocked raters, found high level of agreement
among raters
• Effect of pause length found to be significant
Results, Discussion
• Shorter pause time is preferred
– Linear trend for 2-16sec pause
• Possible that pause time became salient because raters
listened to conversations with multiple pause times
• VOCA user had significant effect
– May be something other than experience causing this
– Smaller effect than pause time, no interaction
• Social nature of conversation
– Perhaps perceived nature of VOCA user had effect
• Pauses might not be legitimized b/c no shared activity
• WUA is important because of causal relationship between
pause time and partner/observer preferences
Ball, et al. 2004
•Amyotrophic Lateral Sclerosis (ALS)
•Neuromuscular disease which results in weakness, atrophy and paralysis
•80% eventually require AAC
•≥25% of these did not accept AAC
•Little is known about the 80% of ALS patients who need and use AAC
•High vs. low tech
•Stage of ALS at adoption
•“Attitudes toward technology”
•Mathy et al. (2000) gives preliminary information
•High tech: detailed needs and wants; written communication; stories
•Low tech: immediate needs and wants; conversation
•Gutmann (1999) found gender differences
•Women prefer low-tech strategies and VOCAs more than men
•Men prefer high-tech writing systems more often than women
•Gutmann and Gryfe (1996) found that early and frequent intervention, and early
introduction of AAC is critical for acceptance
•Using an AAC can allow someone with ALS to continue working
•Focus on high-tech AAC
•Low-tech options haven’t changed very much, and have been examined
•High-tech options are changing rapidly and becoming more accessible
•Is there a pattern to adoption of high-tech AAC?
•Why do people use/reject AAC?
Group
•
•
•
•
•
•
•
50 ALS patients monitored over 4 years
22 females, 28 males
17 bulbar, 22 spinal, 11 mixed diagnosis
Ages 36-78 (μ=60.16 y.o.)
All spoke English primarily
2 had cognitive deficits
Seen for AAC assessment when their speech
began to change
– Those wanting only written communication were not
included in the study
• Wide variety of educational levels, social status
Procedure
• AAC assessment when intelligibility ≤90% or
speaking rate ≤100wp (tested quarterly)
• Patients presented with various devices
– Tried them during presentation
– Evaluator made recommendations
– Could bring home favored device for 1 week trial
• AAC intervention recommended
• AAC acceptance, use, rejection and
discontinuance were monitored until their
death (4-181mo., μ=43.8mo, SD=37.54mo)
Results
• Acceptance: 90% immediate, 6% delayed
– Came from all social classes
– No gender differences
• Immediate Acceptance
– In interviews, participants listed communication,
participation and employment as reasons for
acceptance
– All used AAC as primary means of communication
Delayed Acceptance
•
•
•
•
Ages 30-39, 70-79
Delay of 6-24 mo.
Preferred multifunctional devices
Delayed in part because of family members
– Believed that they could understand the participants well
enough to meet their needs
– Thought they were providing adequate care w/o AAC
• 2 thought that AAC questioned the quality of their care
• One physician advised a family to accept dysarthria
rather than turn to technology
• Three individuals were in some form of denial
Rejection and Discontinuance
• Rejected by the two participants with
cognitive limitation
• No discontinuance
– High-tech AAC often abandoned at end-of-life
Discussion
•
•
•
•
Saw wider adoption than before (1996)
AAC seems to be more widely accepted in society
US began funding AAC devices in 2000
Recommendations
– Providing appropriate information regarding the
speech-language characteristics of ALS
– Regular contact/monitoring
– Sustaining awareness of AAC/intervention
opportunities
• Doctors must be aware of these options and be able to
explain them