For each paper, the authors note the theories (if any) that were cited as guiding the development of the intervention. Most common were Theory of Planned Behavior, the Health Belief Model, Self-Determination Theory, Social Cognitive Theory, Self-Regulation Theory, the Transtheoretical Model, and cognitive-behavioral theory. The authors find that these theories are somewhat lacking for their ability to to guide the use of the affordances of mobile technology. In particular, theories based on linear progression tend not to take full advantage of mobile devices’ abilities for context awareness (through sensing and prompted input), increased variation in timing and content of interventions, and real-time awareness of outcomes.

While the health behavior theories may not be up to the task on their own, there are theories from a variety of disciplines that can guide, and are guiding, intervention designers. The authors suggest looking to control theory to address some of the shortcomings they identify, and offer a couple of interesting examples. The HCI and social computing community, which knows interactive systems and technology mediated social interaction, also has quite a bit to offer. Though HCI often borrows or adds to theories from other disciplines — including from the health community — it has developed quite a bit of expertise about how to implement these in systems, and developed its own applicable frameworks, such as the Reader to Leader Framework.

The way the authors found the literature they reviewed — a search of Medline — highlights a bit of a communication gap between these disciplines. No papers from HCI conferences or journals were included in this review, despite work such as UbiFit, MAHI, and Fish’n’Steps being excellent examples of the type of studies the authors reviewed. The authors are not alone in excluding HCI papers from review articles on this topic (e.g., [1], [2]). I agree, to a large extent, with Klasnja et al that deep understanding of users’ experiences with behavior change strategies, as implemented in a system, is one of the primary contributions HCI make can to this health behavior change work. Stopping at such findings, however, may be insufficient for the results to have an impact outside of the HCI community, much as eco-feedback work in HCI has not always reached beyond the HCI community.

We should also keep asking what might be on a research agenda that combines the strengths of HCI and health behavior change communities (or at least, I should, since I have a dissertation to write), with an eye toward advancing both theory and practice. Some thoughts:

What are appropriate ways to elicit social support or social pressure from others on social network sites or other technology mediated spaces?

When does anonymity support people trying to make healthy behavior changes, such as by helping them feel comfortable sharing, and when it is a barrier to making those changes, such as by reducing accountability?

Are some types of nudges that work once or in non-discretionary use problematic over the long term, such as because people start avoiding the system and aversive feedback?

How can we most effectively use leaderboards to support behavior change and maintenance? With data from many people, there are many more options for how to construct leaderboards than in smaller groups. For a given behavior or set of people, which comparison direction(s) and dimensions of similarity are most effective?

As features used to nudge or support decision making become more widely adopted, do some feature remain effective or become like infrastructure while others become less effective?

What are the privacy implications of integrating sensing, storage, and sharing of of data? Is HIPAA policy even up to this task?

When do behavior nudges become coercive? What are good practices for ethics of these interventions? While many counseling strategies promote autonomy and intrinsic motivation as both a matter of efficacy and ethics, does this balance shift as devices that allow constant and long term monitoring (and thus control) become available? For example, some people can already save money on their insurance by agreeing to wear a pedometer and meeting certain step goals.

These are by no means completely open questions. Researchers and practitioners in both communities have been tackling many of them and making progress. It is also by no means a complete list.

I agree with a lot of Nick’s thoughts. Having validated reading behavior is useful – though it’s also interesting to get the difference between what topics people read and what topics people want others to know they read. As Nick points out, it might be a way for people to communicate to others that they are an expert on a topic – or at least an informed reader, as I suspect that experts may have other channels for following the topics about which they care most.

Though BunchBall sort of looks down on the quantified self aspect, I do think it’s useful to give people feedback on what they are reading (sort of like last.fm) for news topics rather than what they think they read, though badges probably aren’t quite as data-rich as I’d want. At Michigan, we’re trying a similar experiment as part of the BALANCE project shortly, to assess whether feedback on past reading behavior affects the balance of political articles that subjects read.

If people do care about earning the badges, either to learn about their reading behavior or to share with others as a sign of their expertise or interests, then they’ll probably read more of their news through Google news – so that it is tracked in more detail. Thus, a win for Google, who gets the pageviews and data.

Google, why do you want me to earn a Kindle badge?

Influence When I first visited, I was encouraged to earn a Kindle badge. I couldn’t figure this out. Yeah, it’s an interesting product, but I don’t want to read a lot of news about it and a review of my Google News history showed that I never had through the site. So why, of all the >500 badges that Google could suggest to me (many for topics I read lots about), is it suggesting Kindle and only Kindle? If left me wondering if it was a random recommendation, if whatever Google used to suggest a badge was not very good for me, or it was a sponsored badge intended to get me to read more about Kindles (speaking of potential wins for Google…).

Whatever the case, this highlights a way that badges could push reading behavior – assuming that people want to earn, or want to avoid earning, badges. This can run both ways. Maybe someone is motivated by gadget badges and so reads more about Kindles; maybe someone doesn’t think of themselves as interested in celebrities or media and is thus pushed to read fewer articles about those topics than they were before. I’m not saying this is bad, per se, as feedback is an important part of self-regulation, but if badges matter to people, the simple design choice of which badges to offer (and promote) will be influential, just as the selection and presentation of articles are.

Last week, Sunlight Labs releasedInbox Influence, a set of browser extensions (Chrome, Firefox) and bookmarklets that annotate senders and entities in the body of emails with who has contributed to them and to whom they have contributed.

I really like the idea of using browser plugins to annotate information people encounter in their regular online interactions. This is something we’re doing on a variety of projects here, including AffectCheck, BALANCE, and Rumors. I think that tools that combine personal data, in-situ, with more depth can teach people more about with whom and with what they are interacting, and this just in time presentation of information is an excellent opportunity to persuade and possibly to prompt reflection. Technically, it’s also a pretty nice implementation.

There are some reasons why this tool may not be so great, however. With Daniel Avrahami, Sunny Consolvo, James Fogarty, Batya Friedman, and Ian Smith, I recently published a paper about people’s attitudes toward the online availability of US public records, including campaign contribution records such as the ones on which Inbox Influence draws. Many respondents to our survey (mailed, no compensation, likely biased toward people who care more about this issue) expressed discomfort with these records being so easily accessible, and less than half (as of 2008) even knew that campaign contribution records were available online before they received the survey. Nearly half said that they wanted some sort of change, and a third said that this availability would alter their future behavior, i.e., they’d contribute less (take this with a grain of salt, since it is about hypothetical future behavior).

Unless awareness and attitudes have changed quite a bit from 2008, tools such as Inbox Influence create privacy violations. The data is being used and presented in ways that people did not anticipate at the time when they made the decision to donate, and at least some people are “horrified” or at least uncomfortable with this information being so easily accessible. Perhaps we just need to do better at educating potential donors about in what ways campaign contribution data may be used (and anticipate future mashups), though it is also possible that tools like this do not need to be made, or could benefit from being a bit more nuanced in when and about whom they load information.

Speaking personally, I’m not sure how I feel. On the one hand, I think that campaign contributions and other other actions should be open to scrutiny and should have consequences. If you take the money you earn from your business and donate it to support Prop 8, I want the opportunity to boycott your business. If you support a politician who wants to eviscerate the NSF, I might want to engage you in conversation about that. On the other hand, I don’t like the idea that my campaign contribution history (anything above the reporting limit) might be loaded automatically when I email a professional colleague or a student. That’s just not relevant—or even appropriate—to the context. And there are some friendships among political diverse individuals that may survive, in part, because those differences are not always made salient. So it also seems like Inbox Influence or tools that let you load, with a click, your Facebook friends’ contribution history, could sometimes cause harm.

Okay, the third and final (?) set of CHI Highlights, consisting of brief notes on some other papers and panels that caught my attention. My notes here will, overall, be briefer than in the other posts.

I possibly should have linked to this one in my post about social networks for health, as my work in that area is why this paper caught my attention. Through a qualitative study, the authors explore how people manage their privacy and disclosures on social network sites.

People tend to apply their own expectations about what they’d like posted about themselves to what they post about others, but sometimes negotiate and ask others to take posts down, and this can lead to either new implicit or explicit rules about what gets posted in the future. They also sometimes stay out of conversations when they know that they are not as close to a the original poster as the other participants (even if they have the same “status” on the social network site). Even offline behavior is affected: people might make sure that embarrassing photos can’t be taken so that they cannot be posted.

To regulate boundaries, some people use different services targeted at different audiences. While many participants believed that it would be useful to create friend lists within a service and to target updates to those lists, many had not done so (quite similar to my findings with health sharing: people say they want the feature and that it is useful, but just aren’t using it. I’d love to see Facebook data on what percent of people are actively using lists.) People also commonly worded posts so that those “in the know” would get more information than others, even if all saw the same post.

Once aversive content had been posted, however, it was sometimes better for participants to ty to repurpose it to be funny or a joke, rather than to delete it. Deletions say “this was important,” while adding smilies can decrease its impact and say “oh, that wasn’t serious.”

Many short-duration user studies rely on self-report data of satisfaction with an interface or tool, even though we know that self-report data is often quite problematic. To measure the relative utility of design alternatives, the authors place them on Mechanical Turk and measure how many tasks people complete on each alternative under differing pay conditions. A design that gets more work for the same or less pay implies more utility. Because of things like the small pay effect and its ability to crowd out intrinsic rewards, I’m curious about whether this approach will work better for systems meant for work rather than for fun, as well as just how far it can go – but I really do like the direction of measuring what people actually do rather than just what they say.

I’m wondering more and more if there’s an appropriate social distance for location trackers: with people are already very close, it is smothering, while with people who are too far, is it creepy? Thinking about preferences for Latitude, I wouldn’t want my family or socially distance acquaintances on there, but I do want friends who I don’t see often enough on there.

Fairly reliable classifier for emotions, including confidence, hesitance, nervousness, relaxation, sadness, and tiredness, based on analysis of typing rhythms on a standard keyboard. One thing I like about this paper is it opens up a variety of systems ideas, ranging from fairly simple to quite sophisticated. I’m also curious if this can be extended to touch screens, which seems like a much more difficult environment.

In a Mechanical Turk based experiment, showing people a picture that induced positive affect increased the quality of ideas generated — measured by originality and creativity — in a creativity task. Negative priming reduced comparison compared to positive or neutral priming. I’m very curious to see if this result is sustainable over time, with the same image or with different images, or in group settings (particularly considering the next paper in this list!)

We’ve seen a lot of research on priming in interfaces lately, most often in lab or mturk based studies. I think it’ll start to get very interesting when we start testing to see if that also works in long-term field deployments or when people are using a system at their own discretion for their own needs, something that has been harder to do in many past studies of priming.

I didn’t make it to these next few presentations, but having previously seen Steven talk about this work, it’s definitely a worth a pointer. The titles nicely capture the main points.

Discussion about the balance between reproducing other studies in different contexts or prior to making “incremental” advances vs. a focus on novelty and innovation. Nice summary here. I think the panelists and audience were generally leading toward increasing the use of replication + extension in the training of HCI PhD students. I think this would be beneficial, in that it can encourage students to learn how to write papers that are reproducible, introduce basic research methods by doing, and may often lead to some surprising and interesting results. There was some discussion of whether there should be a repli.chi track alongside an alt.chi track. I’m a lot less enthusiastic about that – if there’s a research contribution, the main technical program should probably be sufficient, and if not, why is it there? I do understand that there’s an argument to be made that it’s worth doing as an incentive, but I don’t think that is a sufficient reason. Less addressed by the panel was that a lot of the HCI research isn’t of a style that lends itself to replication, though Dan Russell pointed out that some studies must also be taken on faith since don’t all have our own Google or LHC.

alt.chi entry into the debate perceived issues with underrepresentation of systems work in CHI submissions and with how CHI reviewers treat systems work. As someone who doesn’t do “real” systems work — the systems I build are usually intended as research probes rather than contributions themselves) — I’ve been reluctant to say much on this issue for fear that I would talk more than I know. That said, I can’t completely resist. While I agree that there are often issues with how systems work is presented and reviewed, I’m not completely sympathetic to the argument in this paper.

Part of my skepticism is that I’ve yet to be shown an example of a good systems paper that was rejected. This is not to say that these do not exist; the authors of the paper are speaking from experience and do great work. The majority of systems rejections I have seen are from reviewing, and the decisions have mostly seemed reasonable. Most common are papers that make a modest (or even very nice) systems contribution, tack on a poorly executed evaluation, and then make claims based on the evaluation that it just doesn’t support. I believe at least one rejection would have been accepted had the authors just left out the evaluation altogether, and I think a bad evaluation and unsupported claims should doom a paper unless they are excised (which maybe possible with the new CSCW review process).

I was a little bit frustrated because Michael’s presentation seemed to gloss over the authors’ responsibilities to explain the merits of their work to the broader audience of the conference and to discuss biases introduced by snowball samples. The last point is better addressed in the paper, but I still feel that the paper still deemphasizes authors’ responsibility in favor of reviewers’ responsibility.

The format for this presentation was also too constrained to have a particularly good discussion (something that was unfortunately true in most sessions with the new CHI time limits). The longer discussion about systems research in the CSCW and CHI communities that followed one of the CSCW Horizons sessions this year was more constructive and more balanced, perhaps because the discussion was anchored at least partially on the systems that had just been presented.

Note on how HCI can improve the way we conduct our work, particularly the view that there are problems and technical solutions to solve them. The authors argue that it may be better think of these as conditions and interventions. Some arguments they make for practice are: Value the implication not to design technology (i.e., that in some situations computing technology may be inappropriate), explicate unpursued avenues (explain alternative interventions and why they were not pursued), technological extravention (are there times when technology should be removed?), more than negative results (why and in what context did the system fail, ad what does that failure mean), and to not to stop building – just to be more reflective on why that building is occurring.

Two lab experiments on whether it is possible to foster more thoughtful commenting and participation by participants in online discussion forums by priming thoughtful norms. The first tested the effects of the behavior of other participants in the forum. The dependent variables were comment length, time taken to write the comments, and number of issue-relevant thoughts. Not surprisingly, being exposed to other thoughtful comments led people to make more thoughtful comments themselves. One of the audience members asked the question about whether this would break down with just one negative or less thoughtful comment (such as how merely one piece of litter seems to break down antilittering norms).

The second study tested effects of visual, textual, and interaction design features on the same dependent variables. The manipulations included a more subdued vs. more playful visual design, differing CAPTCHAs (words positively correlated with thoughtfulness in the thoughtful condition and words negatively correlated with thoughtfulness in the unthoughtful condition), and different labels for the comment box. The design intended to provoke thoughtfulness did correspond to more thoughtful comments, suggesting that it is possible, at least in the lab, to design sites to prompt more thoughtful comments. For this second study in particular, I’m curious if these measures only work in the short term or if they would work in the long term and about the effects of each of the specific design features.

This paper actually appeared a in health session, but I found that it spoke much more to the issues my colleagues and I are confronting in the BALANCE project. The authors begin with the observation that most recommender systems are intended to produce content that their users will like, but that this can be problematic. In the health and wellness domain, people sometimes need to hear information that might disagree with their perspective or currently held beliefs, and so it can be valuable to recommend disagreeable information. In this Mechanical Turk-based study, subjects were equally likely to follow preference-consistent and preference-inconsistent recommendations. Following preference-inconsistent recommendations did reduce confirmation bias, but people were happier to see preference-consistent recommendations. This raises the important question: subjects may have followed the recommendation the first time, but now that they know this system gives recommendations they might not like, will they follow the recommendations less often in the future, or switch to another system altogether?

I really like the work Travis is doing with Reflect and ConsiderIt (which powers the Living Voters Guide) to promote more thoughtful listening and discussion online, so I was happy to see this WiP and am looking forward to seeing more!

One critique: despite ample selective exposure research, I’m not quite comfortable with this paper’s assumption that political preference maps so neatly to political information preference, partly because I think this may be an interesting research question: do people who lean slightly one way or the other prefer information sources that may be more biased than they are? (or something along those lines)

In addition to these papers, Ethan Zuckerman’s closing plenary, Desperately Seeking Serendipity, touched on the topics of serendipity and homophily extensively. Zuckerman starts by suggesting the reason that people like to move to cities – even at times when cities were really quite horrible places – is, yes, for more opportunities and choices, but also “to encounter the people you couldn’t encounter in your rural, disconnected lifestyle… to become a cosmopolitan, a citizen of the world.” He goes on, “if you wanted to encounter a set of ideas that were radically different than your own, your best bet in an era before telecommunications was to move to a city.” There reasons to question this idea of cities as a “serendipity engine,” though: even people in urban environments have extremely predictable routines and just don’t go all that many places. Encounters with diverse others may not as common as idealized.

He then shifts gears to discuss what people encounter online. He walks through the argument that the idea of a Freshman Fishwrap or Daily Me is possibly quite harmful as it allows people to filter to the news that they want. Adding in social filters or getting news through our friends can make this worse. While Sunstein is concerned about this leading to polarization within the US, Zuckerman is more concerned that it leads people to see only news about where they are and less news about other places or from outside perspectives. This trend might lead people to miss important stories.

I tend to agree with the argument that surfacing coincidences or manufacturing serendipity is an incredibly powerful capability of current technology. Many of the approaches that the design community has taken to achieve this are probably not the kind of serendipity Zuckerman is looking for. I love Dopplr’s notifications that friends are also in town, but the time I spend with them or being shown around by them is time that I’m less likely to have a chance encounter with someone local or a traveler from elsewhere. The ability to filter reviews by friends may make for more accurate recommendations, but I’m also less likely to end up somewhere a bit different. Even serendipity has been repurposed to support homophily

Now, it might be that the definition of serendipity that some of the design community hasn’t quite been right. As Zuckerman notes, serendipity usually means “happy accident” now – it’s become a synonym for coincidence – and that the sagacity part of the definition has been lost. Zuckerman returns to the city metaphor, arguing for a pedestrian-level view. Rather than building tools for only efficiency and convenience, build tools and spaces that maximize the chances to interact and mix. Don’t make filters hidden. Make favorites of other communities visible, not just the user’s friends. Zuckerman elegantly compares this last feature to the traces in a city: one does not see traces left just by one’s friends, no, but traces left by other users of the space, and this gives people a chance to wander from the path they were already on. One might also overlay a game on a city, to encourage people to explore more deeply or venture to new areas.

While I like these ideas, I’m a little concerned that they will lead to somewhat superficial exposure to the other. People see different others on YouTube, Flickr, or in the news, and yes, some stop and reflect, others leave comments that make fun of them, and many others just move on to the next one. A location-based game might get people to go to new places, but are they thinking about what it is like to be there, or are they thinking about the points they are earning? This superficiality is something I worry about in my own work to expose people to more diverse political news – they may see it, but are they really considering the perspective or gawking at the other side’s insanity? Serendipity may be necessary, but I question whether it is sufficient. We also need empathy: technology that can help people listen and see others’ perspectives and situations. Maybe empathy is part of the lost idea of sagacity that Zuckerman discusses — a sort of emotional sagacity — but whatever it is, I need to better know how to design for it.

For SI and UM students who really engage with this discussion and the interweaving of cities, technology, and flows, I strongly, strongly recommend ARCH 531 (Networked Cities).

I want to take few minutes to highlight a few papers from CHI 2011, spread across a couple of posts. There was lots of good work at this conference. This post will focus on papers in the persuasive technology and social software for health and wellness space, which is the aspect of my work that I was thinking about most during this conference.

Fit4life is a hypothetical system that monitors users’ behavior using a variety of tactics in the Persuasive Systems Design model. After describing the system (in such a way that someone in the room commented made the audience “look horrified”), the authors transition to a reflection on persuasive technology research and design, and how such a design can “spiral out of control.” As someone working in this space, the authors hit on some of the aspects that leave me a bit unsettled: persuasion vs. coercion, individual good vs. societal good, whether people choose their own view points or are pushed to adopt those of the system designers, measurement and control vs. personal experiences and responsibility, and increased sensing and monitoring vs. privacy and surveillance and the potential to eliminate boundaries between front stage and back stage spaces. The authors also discuss how persuasive systems with very strong coaching features can reduce the opportunity for mindfulness and for their users to reflect on their own situation: people can simply follow the suggestions rather than weigh the input and decide among the options.

This is a nice paper and a good starting point for lots of discussions. I’m a bit frustrated that it was presented in a different yet concurrent session as the session on persuasive technology for health. As such, it probably did not (immediately) reach the audience that would have led to the most interesting discussion about the paper. In many ways, it argued for a “think about what it is like to live with” rather than “pitch” approach to thinking about systems. I agree with a good bit of the potential tensions the authors highlight, but I think they are a bit harder on the persuasive tech community than appropriate: in general, persuasive tech folks are aware we are building systems intended to change behavior and that this is fraught with ethical considerations, while people outside of the community often do not think of their systems as persuasive or coercive, even when they are (again, I mean this in a Nudge, choice-environments sense. On the other hand, one presentation at Persuasive last year did begin with the statement “for the sake of this paper, set aside ethical concerns” (paraphrased), so clearly there is still room for improvement.

Based on interviews with nineteen individuals, the authors present an overview of approaches for how to involve peers in technology for weight management. These approaches fall into passive involvement (norms and comparisons) and five types of active involvement (table 1 in the paper): obstructive (“don’t do it”), inductive (“you should do it”), proactive (“do it with me”), supportive (“I’ll do it too”), and cooperative (“let’s do it together”). The last category includes competition, though there was some disagreement during the Q&A about whether that is the right alignment. The authors also find gender- and role- based differences in perceived usefulness of peer-based interventions, such as differences in attitudes about competition.

A nice paper that evaluates different persuasive approaches for workplace snack selection. These include:

default choice: a robot showing all snack choices with equal convenience or the healthy one more visibly, or a website that showed all snack choices (in random order) or that paginated them, with healthy choices shown on the first page.

planning: asking people to order a snack for tomorrow rather than select at the time of consumption.

information strategy: showing calorie counts for each snack.

As one would expect, default choice strategy was highly effective in increasing the number of people who chose the healthy snack (apples) rather than the unhealthy snack (cookies). The planning strategy was effective among people who had a healthy snacking lifestyle, while those who snacked unhealthily continued to choose cookies. Interestingly, the information strategy had no effect on healthy snackers and actually led healthy snackers to choose cookies more than they otherwise would have. The authors speculate that this is either because the healthy snackers overestimate the caloric value of cookies in the absence of information (and thus avoid them more), or because considering the healthy apple was sufficiently fulfilling even if they ultimately chose the cookie.

Some questions the study leaves open are: would people behave they same if they had to pay for the snacks? what would happen in a longer term deployment? What would have happened if the cookies were made the default, particularly for otherwise healthy snackers?

Interviews with 20 Wii Fit users revela side effects of this use: some stop using it because it did not work while others stop because they go on to other, preferred fitness activities (abandonment as success), a tension between whether the Fit is viewed as a game or exercise tool (people rarely view it as both), and negative emotional impacts (particularly frustrating when the system misinterpreted some data, such as weight gains). One suggestion the authors propose is that behavior change systems might start with activities that better resemble games but gradually transition users to activities with fewer game-like elements, and eventually wean users off of the system all together. In practice, I’m not sure how this would work, but I like this direction because it gets at one of my main critiques of gamification: take away the game and its incentives (which my distract from the real benefits of changing one’s behavior) and the behavior reverts quite quickly.

Lab experiment evaluating the effects of using multiple sources of advice (single expert or consensus of similar others) at the same time, disclosing that advice is intended to persuade, and allowing users to select their source of advice. (This is framed more generally as about persuasive systems, but I think the framing is too broad: it’s really a study about advice.) Results: people are more likely to follow advice when they choose the source, people are less likely to follow advice when they are told that it is intended to persuade, and when shown expert advice and consensus advice from similar others, subjects were less likely to follow the advice than when they were only shown expert advice — regardless of whether the expert and consensus advice concurred with each other. This last finding is surprising to me and to the authors, who suggest that it may be a consequence of the higher cognitive load of processing multiple sources of advice; I’d love to see further work on this.

Aggregation of literature review, interviews with sleep experts, a survey of 230 individuals, and 16 potential users to learn about opportunities and challenges for designing sleep technologies. The work leads to a design framework that considers the goal of the individual using the system, the system’s features, the source of the information supporting the design choices made, the technology used, and stakeholders involved, and the input mechanism. During the presentation, I found myself thinking a lot about two things: (1) the value of design frameworks and how to construct a useful one (I’m unsure of both) and (2) how this stacks up against Julie’s recent blog post that is somewhat more down on the opportunities of tech for health.

The authors argue that evaluating behavior change systems based solely on whether they changed the behavior is not sufficient, and often infusible. Instead, they argue, HCI should focus on whether systems or features effectively implement or support particular strategies, such as self-monitoring or conditioning, which can be measured in shorter term evaluations.

I agree with much of this. I think that more useful HCI contributions in this area speak to which particular mechanisms or features worked, why and how they worked, and in what context one might expect them to work. Contributions that throw the kitchen sink of features at a problem and do not get into the details of how people reacted to the specific features and what they features accomplished may tell us that technology can help with a condition, but do not, in general, do a lot to inform the designers of other systems. I also agree that shorter-term evaluations are often able to show that particular feature is or is not working as intended, though longer term evaluations are appropriate to understand if it continues to work. I am also reminded of the gap between the HCI community and the sustainability community pointed out by Froehlich, Findlater, and Landay at CHI last year, and fear that deemphasizing efficacy studies and RCTs will limit the ability of the HCI community to speak to the health community. Someone is going to have to do the efficacy studies, and the HCI community may have to carry some of this weight in order for our work to be taken seriously elsewhere. Research can make a contribution without showing health improvements, but if we ignore the importance of efficacy studies, we imperil the relevance of our work to other communities.

Four month deployment of a system for monitoring medication taking and phone use in the homes of two older adults. The participants sought out anomalies in the recorded data; when they found them, they generally trusted the system and focused on explaining why it might have happened, turning first to their memory of the event and then to going over their routines or other records such as calendars and diaries. I am curious if this trust would extend to a purchased product rather than one provided by the researchers (if so, this could be hazardous in an unreliable system); I could see arguments for it going each way.

The authors found that these systems can help older remain aware of their functional abilities and helped them better make adaptations to those abilities. Similar to what researchers have recommended for fitness journals or sensors, the authors suggest that people be able to annotate or explain discrepancies in their data and be able to view it jointly. They also suggest highlighting anomalies and showing them with other available contextual information about that date or time.

I generally agree with Sunny Consolvo: feedback and consequences in persuasive systems should generally range from neutral to positive, and have been reluctant (colleagues might even say “obstinate”) about including it in GoalPost or Steps. Julie Kientz’s work, however, finds that certain personalities think they would respond well to negative feedback. This work in progress tests negative (“aversive”) feedback: Facebook posts about songs and the statement that they were using lots of energy in a pilot with five participants. The participants seemed to respond okay to the posts — which are, in my opinion, pretty mild and not all that negative — and often commented on them. The authors interpret this as aversive feedback not leading to disengagement, but I think that’s a bit too strong of a claim to make on this data: participants, despite being unpaid but having been recruited to the study, likely felt some obligation to follow through to its end in a way that they would not for a commercially or publicly available system, and, with that feeling, may have commented out of a need to publicly explain or justify their usage as shown in the posts. The last point isn’t particularly problematic, as such reflection may be useful. Still, this WiP and the existence of tools like Blackmail Yourself (which *really* hits at the shame element) do suggest that there is more work needed on the efficacy of public, aversive feedback.

In my work, I’ve heard a lot of concern about posting health related status updates and about seeing similar status updates from others, but I haven’t taken a detailed look at the status updates that people are currently making, which this WiP makes a start on for physical activity posts on Twitter.By analyzing the results of queries for “weight lifting”, “Pilates”, and “elliptical”, the authors find posts that show evidence of exercise, plans for exercise, attitudes about exercise, requests for help, and advertisements. As the authors note, the limited search terms probably lead to a lot of selection bias, and I’d like to see more information about posts coming from automated sources (e.g., FitBit), as well as how people reply to the different genres of fitness tweets.

Fun yet concerning alt.chi work on pushing people to smile in order to increase positive mood. With features such as requiring a smile to open the refrigerator, positive feedback (lights, music) in exchange for smiles, automatic sharing of photos of facial expressions with friends or family members, automatic posting of whether or not someone is smiling enough, this paper hits many of the points about which the Fit4life authors raise concerns.

There was a lot of interesting work — I came home with 41 papers in my “to read” folder — so I’m sure that I’m missing some great work in the above list. If I’m missing something you think I should be reading, let me know!

One of the other things to come out in my visit to Malcolm’s class is an awareness of a certain difference in styles between School of Information, HCI/user-centered-design project presentations and architecture project presentations. Basically, teams of SI students or mostly-SI students presented projects as a bit of a “pitch” — this is why our idea is great and should be pursued — while teams with students from architecture tended to present projects ideas as a bit less positive, including at least one presentation of an idea as leading to a very dystopian world. One of the other visitors, an architect, reflected at the end on how strange it felt to have had three hours of mostly back-to-back pitches rather than discussions about what it would be like to “live with” a system.

This prompted some reactions from the SI folks, some of which got posted to Twitter, e.g., “shouldn’t a concept have a compelling use- or who would use it, and why?” and “there was an arch guy who was shocked by the idea of considering users.” I can understand these reactions, but as someone with less skin in the game (no project to pitch), I think I had a more moderated reaction. After reflecting a bit, I want to write down some thoughts about this difference and start a discussion.

The difference that I saw in the presentations was that the “pitches” showed use cases with no downsides, or only technical obstacles. The “live with” presentations showed a vision that was more rounded and showed pros as well as some very serious cons, particularly for people who would be affected by but may not choose to use the system. In comparison to the “live with” presentations, the pitches seemed a bit naïve or even dishonest, while the “live with” presentations felt incomplete: given such obvious problems, why not change the idea?

So, where does this difference come from? Of course architects consider the people who will be affected by their creations — and not just the “users” — so it’s not that. And of course things should have a compelling use. Something that has a compelling use for some people, though, may still create a less than pleasant experience for others who it affects. This is particularly true for architecture projects — everyone in a neighborhood has to live with a building, not just its occupants — so I can see how that would lead to a certain style of presentation or discussion of a proposal. This is not, however, unique to buildings; groupware and social software certainly affect people who may not opt in, and some persuasive technology is designed specifically to influence people who do not opt in, and so maybe it would be good for some HCI presentations to take a bit more of a humble tone that acknowledges potential downsides.

On the other hand, it’s also often fairly easy to prototype and even do large-scale test deployments of software (i.e., try living with) in a way that simply isn’t possible with large buildings or urban development projects. These prototypes and field tests often let designers learn many of the unintended consequences. (Of course, you only learn about the group you test the app with.)

This assuredness of early feedback on software products, as well as the ability to iterate rapidly after deployment to correct for problems or take advantage of newfound opportunities, makes many software presentations more about why something is worth starting this process of building, releasing, and refining, rather than a discussion about building and living with fairly immutable and durable creation, and I think that motivates a lot of the difference in styles. I’m not completely sure that software designers can continue with this attitude as software becomes more social and hooking up system A to system B can lead to information disclosures with long lasting effects.

On Monday, I had the pleasure of visiting Malcolm McCullough’sArchitecture 531 – Networked Cities for final presentations. Many of the students in the class are from SI, where we talk a lot about incentive-centered design, choice architecture, and persuasive technology, which seems to have resulted in many of the projects having a persuasive technology angle. As projects were pitched as “extracting behavior” or “compelling” people to do things, it was interesting to watch the discomfort in the reactions from students and faculty who don’t frame problems in this way.1

Thinking about this afterwards brought me back to a series of conversations at Persuasive this past summer. A prominent persuasive technology researcher said something along the lines of “I’m really only focusing on people who already want to change their behavior.” This caused a lot of discussion, with major themes being: Is this a cop-out, shouldn’t we be worried about the people who aren’t trying? Is this just a neat way of skirting the ethical issues of persuasive (read: “manipulative”) technology?

I’m starting to think that there may be an important distinction that may help address these questions, one between technology that pushes people to do something without them knowing it and technology that supports people in achieving a behavior change they desire. The first category might be persuasive technology, and for now, I’ll call the second category mindful technology.

Persuasive Technology

I’ll call systems that push people who interact with them to behave in certain ways, without those people choosing the behavior change as an explicit goal, Persuasive Technology. This is a big category, and I believe that most systems are persuasive systems in that their design and defaults will favor certain behaviors over others (this is a Nudge inspired argument: whether or not it is the designer’s intent, any environment in which people make choices is inherently persuasive).

Mindful Technology

For now, I’ll call technology that helps people reflect on their behavior, whether or not people have goals and whether or not the system is aware of those goals, mindful technology. I’d put apps like Last.fm and Dopplr in this category, as well as a lot of tools that might be more commonly classified as persuasive technology, such as UbiFit, LoseIt, and other trackers. While designers of persuasive technology are steering users toward a goal that the designers’ have in mind, the designers of mindful technology give users the ability to better know their own behavior to support reflection and/or self-regulation in pursuit of goals that the users have chosen for themselves.

Others working in the broad persuasive tech space have also been struggling with the issue of persuasion versus support for behaviors an individual chooses, and I’m far from the first to start thinking of this work as being more about mindfulness. Mindfulness is, however, a somewhat loaded term with its own meaning, and that may or may not be helpful. If I were to go with the tradition of “support systems” naming, I might call applications in this category “reflection support systems,” “goal support systems,” or “self-regulation support systems.”

Where I try to do my work

I don’t quite think that this is the right distinction yet, but it’s a start, and I think these are two different types of problems (that may happen to share many characteristics) with different sets of ethical considerations.

Even though my thinking is still a bit rough, I’m finding this idea useful in thinking through some of the current projects in our lab. For example, among the team members on AffectCheck, a tool to help people see the emotional content of their tweets, we’ve been having a healthy debate about how prescriptive the system should be. Some team members prefer something more prescriptive – guiding people to tweet more positively, for example, or tweeting in ways that are likely to increase their follower and reply counts – while I lean toward something more reflective – some information about the tweet currently being authored, how the user’s tweets have changed over time, here is how they stack up against the user’s followers’ tweets or the rest of Twitter. While even comparisons with friends or others offer evidence of a norm and can be incredibly persuasive, the latter design still seems to be more about mindfulness than about persuasion.

This is also more of a spectrum than a dichotomy, and, as I said above, all systems, by nature of being a designed, constrained environment, will have persuasive elements. (Sorry, there’s no way of dodging the related ethical issues!) For example, users of Steps, our Facebook application to promote walking (and other activity that registers on a pedometer), have opted in to the app to maintain or increase their current activity level. They can set their own daily goals, but the app’s goal recommender will push them to the fairly widely accepted recommendation of 10,000 steps per day. Other tools such as Adidas’s MiCoach or Nike+ have both tracking and coaching features. Even if people are opting into specific goals, the mere limited menu of available coaching programs is a bit persuasive, as it constrains people’s choices.

Overall, my preference when designing is to focus on helping people reflect on their behavior, set their own goals, and track progress toward them, rather than to nudge people toward goals that I have in mind. This is partly because I’m a data junkie, and I love systems that help me learn more about my behavior is without telling me what it should be. It is also partly because I don’t trust myself to persuade people toward the right goal at all times. Systems have a long history of handling exceptions quite poorly. I don’t want to build the system that makes someone feel bad or publicly shames them for using hotter water or a second rinse after a kid throws up in bed, or that takes someone to task for driving more after an injury.

I also often eschew gamification (for many reasons), and to the extent that my apps show rankings or leaderboards, I often like to leave it to the viewer to decide whether it is good to be at the top of the leaderboard or the bottom. To see how too much gamification can prevent interfere with people working toward their own goals, consider the leaderboards on TripIt and similar sites. One person may want to have the fewest trips or miles, because they are trying to reduce their environmental impact or because they are trying to spend more time at home with family and friends, while another may be trying to maximize their trips. Designs that simply reveal data can support both goals, while designs that use terms like “winning” or that award trophies or badges to the person with the most trips start to shout: this is what you should do.

Thoughts?

What do you think? Useful distinction? Cluttering of terms? Have a missed an existing, better framework for thinking about this?

1Some of the discomfort was related to some of the projects’ use punishment (a “worst wasters” leaderboard or similar). This would be a good time to repeat Sunny Consolvo’s guideline that technology for persuasive technology range from neutral to positive (Consolvo 2009), especially, in my opinion, in discretionary use situations – because otherwise people will probably just opt-out.

For those interested in the software that drives the SIDisplay, SI master’s student Morgan Keys has been working to make a generalized and improved version available. You can find it, under the name “@display” at this GitHub repository.

SIDisplay is a Twitter-based public display described in a CSCW paper with Paul Resnick and Emily Rosengren. We built it for the School of Information community, where it replaced a number of previous displays, including a Thank You Board (which we compare it to in the paper), a photo collage (based on the context, content & community collage), and a version of the plasma poster network. Unlike many other Twitter-based displays, SI Display and @display do not follow a hashtag, but instead follow @-replies to the display’s Twitter account. It also includes private tweets, so long as the Twitter user has given the display’s Twitter account permission to follow them.

When preparing our Persuasive 2010paper on Three Good Things, we ended up cutting a section on using word clouds to support reflection. The section wasn’t central to this paper, but it highlights one of the design challenges we encountered, and so I want to share it and take advantage of any feedback.

Our Three Good Things application (3GT) is based on a positive psychology exercise that encourages people to record three good things that happen to them, as well as the reasons why they happened. By focusing on the positive, rather than dwelling on the negative, it is believed that people can train themselves to be happier.

Example 3GT tag clouds

When moving the application onto a computer (and out of written diaries), I wanted to find a way to leverage a computer’s ability to analyze a user’s previous good things and reasons to help them identify trends. If people are more aware of what makes them happy, or why these things happen, they might make decisions that cause these good things to happen more. In 3GT, I made a simple attempt to support this trend detection by generating word clouds from a participant’s good things and reasons. I used simple stop-wording, lowerizing, and no stemming.

Limited success for Word Clouds

When we interviewed 3GT users, we expected to find that the participants believed the word clouds helped them notice and reinforce trends in their good things. Results here were mixed. Only one participant we interviewed described how the combination of listing reasons and seeing them summarized in the word clouds had helped her own reflection:

“You’ve got tags that show up, like tag clouds on the side, and it kind of pulls out the themes… as I was putting the reasoning behind why certain [good] things would happen, I started to see another aspect of a particular individual in my life. And so I found it very fascinating that I had pulled out that information… it’s made me more receptive to that person, and to that relationship.”

A second participant liked the word cloud but was not completely convinced of its utility:

I like having the word cloud. I noticed that the biggest thing in my reason words is “cat”. (Laughs). And the top good words isn’t quite as helpful, because I’ve written a lot of things like ‘great’ and ‘enjoying’ – evidently I’ve written these things a lot of times. So it’s not quite as helpful. But it’s got ‘cat’ pretty good there, and ‘morning’, and I’m not sure if that’s because I’ve had a lot of good mornings, or I tend to write about things in the morning.

Another participant who had examined the word cloud noticed that “people” was the largest tag in his good things cloud and “liked that… [his] happiness comes from interaction with people,” but that he did not think that this realization had any influence over his behavior outside of the application.

One participant reported looking at the word clouds shortly after beginning to post. The words selected did not feel representative of the good things or reasons he had posted, and feeling that they were “useless,” he stopped looking at them. He did say that he could imagine it “maybe” being useful as the words evolved over time, and later in the interview revisited one of the items in the word cloud: “you know the fact that it says ‘I’m’ as the biggest word is probably good – it shows that I’m giving myself some credit for these good things happening, and that’s good,” but this level of reflection was prompted by the interview, not day-to-day use of 3GT.

Another participant did not understand that word size in the word cloud was determined by frequency of usage and was even more negative:

It was like you had taken random words that I’ve typed, and some of them have gotten bigger. But I couldn’t see any reason why some of them would be bigger than the other ones. I couldn’t see a pattern to it. It was sort of weird… Some of the words are odd words… And then under the Reason words, it’s like they’ve put together some random words that make no sense.

Word clouds did sometimes help in ways that we had not anticipated. Though participants did not find that they helped them identify trends that would influence future decisions, looking at the word cloud from her good things helped at least one participant’s mood.

I remember ‘dissertation’ was a big thing, because for a while I was really gunning on my dissertation, and it was going so well, the proposal was going well with a first draft and everything. So that was really cool, to be able to document that and see… I can see how that would be really useful for when I get into a funk about not being able to be as productive as I was during that time… I like the ‘good’ words. They make me feel, I feel very good about them.

More work?

The importance of supporting reflection has been discussed in the original work on Three Good Things, as well as in other work that has shown how systems that support effective self-reflection can improve users’ ability to adopt positive behaviors as well as increase their feelings of self-efficacy. While some users found benefit in word clouds to assist reflection, a larger portion did not notice them or found them unhelpful. More explanation should be provided about how word clouds are generated to avoid confusion. They should also perhaps not be shown until a participant has entered a sufficient amount of data. To help participants better notice trends, improved stop-wording might be used, as well as detecting n-grams (e.g. “didn’t smoke” versus “smoke”) and grouping of similar terms (e.g., combining “bread” and “pork” into “food”). Alternatively, a different kind of reflection exercise might be more effective, one where participants are asked to review their three good things posts and write a longer summary of the trends they have noticed.