Abstract

Searching collaboratively for places of interest is a common activity that frequently occurs on individual mobile phones, or on large tourist-information displays in public places such as visitor centers or train stations. We
created a public display system for collaborative travel planning, as well as a mobile app that can augment the display. We tested them against third-party mobile apps in a simulated travel-search task to understand how the unique features of mobile phones and large displays might be leveraged together to improve collaborative travel planning experience.

Abstract

We leverage a popularity measure in social media as a distant label for extractive summarization of online conversations. In social media, users can vote, share, or bookmark a post they prefer. The number of these actions is regarded as a measure of popularity. However, popularity is not solely determined by content of a post, e.g., a text or an image in a post, but is highly contaminated by its contexts, e.g., timing, and authority. We propose a disjunctive model, which computes the contribution of content and context separately. For evaluation, we build a dataset where the informativeness of a comment is annotated. We evaluate the results with ranking metrics, and show that our model outperforms the baseline model, which directly uses popularity as a measure of informativeness.

Abstract

Prewriting is the process of generating and organizing ideas before drafting a document. Although often overlooked by novice writers and writing tool developers, prewriting is a critical process that improves the quality of a final document. To better understand current prewriting practices, we first conducted interviews with writing learners and experts. Based on the learners’ needs and experts’ recommendations, we then designed and developed InkPlanner, a novel pen and touch visualization tool that allows writers to utilize visual diagramming for ideation during prewriting. InkPlanner further allows writers to sort their ideas into a logical and sequential narrative by using a novel widget— NarrativeLine. Using a NarrativeLine, InkPlanner can automatically generate a document outline to guide later drafting exercises. Inkplanner is powered by machine-generated semantic and structural suggestions that are curated from various texts. To qualitatively review the tool and understand how writers use InkPlanner for prewriting, two writing experts were interviewed and a user study was conducted with university students. The results demonstrated that InkPlanner encouraged writers to generate more diverse ideas and also enabled them to think more strategically about how to organize their ideas for later drafting.

Abstract

With the tremendous progress in sensing and IoT infrastructure, it is foreseeable that IoT systems will soon be available for commercial markets, such as in people's homes. In this paper, we present a deployment study using sensors attached to household objects to capture the resourcefulness of three individuals. The concept of resourcefulness
highlights the ability of humans to repurpose objects spontaneously for a different use case than was initially intended. It is a crucial element for human health and wellbeing, which is of great interest for various aspects of HCI and design research. Traditionally, resourcefulness is captured through ethnographic practice. Ethnography can only provide sparse and often short duration observations of human experience, often relying on participants being aware of and remembering behaviours or thoughts they need to report on. Our hypothesis is that resourcefulness can also be captured through continuously monitoring objects being used in everyday life. We developed a system that can record object movement continuously and deployed them in homes of three elderly people for over two weeks. We explored the use of probabilistic topic models to analyze the collected data and identify common patterns.

Abstract

Despite reflection being identified as a key component of behavior change, most existing tools do not explicitly design for it, carrying an implicit assumption that providing access to self-tracking data is enough to trigger reflection. In this work we design a system for reflection around physical activity. Through a set of workshops, we generated a corpus of 275 reflective questions. We then combine these questions into a set of 25 reflective mini-dialogues. We deliver our mini-dialogues through MMS. 33 active users of fitness trackers used our system in a 2-week field deployment. Results suggest that the mini-dialogues were successful in triggering reflection and that this reflection led to increases in motivation, empowerment, and adoption of new behaviors. Encouragingly, 16 participants elected to use the system for two additional weeks without compensation. We present implications for the design of technology-supported dialog system for reflection.

Abstract

Continuous monitoring with unobtrusive wearable social sensors is becoming a popular method to assess individual affect states and team effectiveness in human research. A large number of applications have demonstrated the effectiveness of applying wearable sensing in corporate settings; for example, in short periodic social events or in a university campus. However, little is known of how we can automatically detect individual affect and group cohesion for long duration missions. Predicting negative affect states and low cohesiveness is vital for team missions. Knowing team members’ negative states allows timely interventions to enhance their effectiveness. This work investigates whether sensing social interactions and individual behaviors with wearable sensors can provide insights into assessing individual affect states and group cohesion. We analyzed wearable sensor data from a team of six crew members who were deployed on a four-month simulation of a space exploration mission at a remote location. Our work proposes to recognize team members’ affect states and group cohesion as a binary classification problem using novel behavior features that represent dyadic interaction and individual activities. Our method aggregates features from individual members into group levels to predict team cohesion. Our results show that the behavior features extracted from the wearable social sensors provide useful information in assessing personal affect and team cohesion. Group task cohesion can be predicted with a high performance of over 0.8 AUC. Our work demonstrates that we can extract social interactions from sensor data to predict group cohesion in longitudinal missions. We found that quantifying behavior patterns including dyadic interactions and face-to-face communications are important in assessing team process.

Abstract

Accurate localization is a fundamental requirement for a variety of applications, ranging from industrial robot operations to location-powered applications on mobile devices. A key technical challenge in achieving this goal is providing a clean and reliable estimation of location from a variety of low-cost, uncalibrated sesnors. Many current techniques rely on Particle Filter (PF) based algorithms. They have proven successful at effectively fusing various sensors inputs to create meaningful location predictions. In this paper we build upon this large corpous of work. Like prior work, our technique fuses Received Signal Strength Indicator (RSSI) measurements from Bluetooth Low Energy (BLE) beacons with map information. A key contribution of our work is a new sensor model for BLE beacons that does not require the mapping from RSSI to distance. We further contribute a novel method of utilizing map information during the initialization of the system and during the resampling phase when new particles are generated. Using our proposed sensor model and map prior information the performance of the overall localization is improved by 1.20 m on comparing the 75th percentile of the cumulative distribution with traditional localization techniques.

Abstract

In this paper, we develop a system for the lowcost indoor localization and tracking problem using radio signal strength indicator, Inertial Measurement Unit (IMU), and magnetometer sensors. We develop a novel and simplified probabilistic IMU motion model as the proposal distribution of the sequential Monte-Carlo technique to track the robot trajectory. Our algorithm can globally localize and track a robot with a priori unknown location, given an informative prior map of the Bluetooth Low Energy (BLE) beacons. Also, we formulate the problem as an optimization problem that serves as the Backend of the algorithm mentioned above (Front-end). Thus, by simultaneously solving for the robot trajectory and the map of BLE beacons, we recover a continuous and smooth trajectory of the robot, corrected locations of the BLE beacons, and the time varying IMU bias. The evaluations achieved using hardware show that through the proposed closed-loop system the localization performance can be improved; furthermore, the system becomes robust to the error in the map of beacons by feeding back the optimized map to the Front-end.

Abstract

In this chapter we discuss the use of external sources of data in designing conversational dialogues. We focus on applications in behavior change around physical activity involving dialogues that help users better understand their self-tracking data and motivate healthy behaviors. We start by introducing the areas of behavior change and personal informatics and discussing the importance of self-tracking data in these areas. We then introduce the role of reflective dialogue-based counseling systems in this domain, discuss specific value that self-tracking data can bring, and how it can be used in creating the dialogues. The core of the chapter focuses on six practical examples of design of dialogues involving self-tracking data that we either tested in our research or propose as future directions based on our experiences. We end the chapter by discussing how the design principles for involving external data in conversations can be applied to broader domains. Our goal for this chapter is to share our experiences, outline design principles, highlight several design opportunities in external data-driven computer-based conversations, and encourage the reader to explore creative ways of involving external sources of data in shaping dialogues-based interactions.

Abstract

We introduce a system to automatically manage photocopies made from copyrighted printed materials. The system monitors photocopiers to detect the copying of pages from copyrighted publications. Such activity is tallied for billing purposes. Access rights to the materials can be checked to prevent printing. Digital images of the copied pages are checked against a database of copyrighted pages. To preserve the privacy of the copying of non-copyright materials, only digital fingerprints are submitted to the image matching service. A problem with such systems is creation of the database of copyright pages. To facilitate this, our system maintains statistics of clusters of similar unknown page images along with copy sequence. Once such a cluster has grown to a sufficient size, a human inspector can determine whether those page sequences are copyrighted. The system has been tested with 100,000s of pages from conference proceedings and with millions of randomly generated pages. Retrieval accuracy has been around 99% even with copies of copies or double-page copies.

Abstract

Historically, people have interacted with companies and institutions through telephone-based dialogue systems and paper-based forms. Now, these interactions are rapidly moving to web- and phone-based chat systems. While converting traditional telephone dialogues to chat is relatively straightforward, converting forms to conversational interfaces can be challenging. In this work, we introduce methods and interfaces to enable the conversion of PDF and web-based documents that solicit user input into chat-based dialogues. Document data is first extracted to associate fields and their textual descriptions using meta-data and lightweight visual analysis. The field labels, their spatial layout, and associated text are further analyzed to group related fields into natural conversational units. These correspond to questions presented to users in chat interfaces to solicit information needed to complete the original documents and downstream processes they support. This user supplied data can be inserted into the source documents and/or in downstream databases. User studies of our tool show that it streamlines form-to-chat conversion and produces conversational dialogues of at least the same quality as a purely manual approach.

Abstract

SlideDiff is a system that automatically creates an animated rendering of textual and media differences between two versions of a slide. While previous work focuses either on textual or image data, SlideDiff integrates text and media changes, as well as their interactions, e.g. adding an image forces nearby text boxes to shrink. Provided with two versions of a slide (not the full history of edits), SlideDiff detects the textual and image differences, and then animates the changes by mimicking what a user would have done, such as moving the cursor, typing text, resizing image boxes, adding images. This editing metaphor is well known to most users, helping them better understand what has changed, and fosters a sense of connection between remote workers, making them feel as if we edited together. After detection of text and image differences, the animations are rendered in HTML and CSS, including mouse cursor motion, text and image box selection and resizing, text deletion and insertion with its cursor. We discuss strategies for animating changes, in particular the importance of starting with large changes and finishing with smaller edits, and provide evidence of the utility of SlideDiff in a workplace setting.

Abstract

Exploring coordinated relationships (e.g., shared relationships between two sets of entities) is an important analytics task in a variety of real-world applications, such as discovering similarly behaved genes in bioinformatics, detecting malware collusions in cyber security, and identifying products bundles in marketing analysis. Coordinated relationships can be formalized as biclusters. In order to support visual exploration of biclusters, bipartite graphs based visualizations have been proposed, and edge bundling is used to show biclusters. However, it suffers from edge crossings due to possible overlaps of biclusters, and lacks in-depth understanding of its impact on user exploring biclusters in bipartite graphs. To address these, we propose a novel bicluster-based seriation technique that can reduce edge crossings in bipartite graphs drawing and conducted a user experiment to study the effect of edge bundling and this proposed technique on visualizing biclusters in bipartite graphs. We found that they both had impact on reducing entity visits for users exploring biclusters, and edge bundles helped them find more justified answers. Moreover, we identified four key trade-offs that inform the design of future bicluster visualizations. The study results suggest that edge bundling is critical for exploring biclusters in bipartite graphs, which helps to reduce low-level perceptual problems and support high-level inferences.

Abstract

Devices with embedded sensors are permeating the computing landscape, allowing the collection and analysis of rich data about individuals, smart spaces, and their interactions. This class of de- vices enables a useful array of home automation and connected workplace functionality to individuals within instrumented spaces. Unfortunately, the increasing pervasiveness of sensors can lead to perceptions of privacy loss by their occupants. Given that many instrumented spaces exist as platforms outside of a user’s control—e.g., IoT sensors in the home that rely on cloud infrastructure or connected workplaces managed by one’s employer—enforcing access controls via a trusted reference monitor may do little to assuage individuals’ privacy concerns. This calls for novel enforcement mechanisms for controlling access to sensed data.
In this paper, we investigate the interplay between sensor fidelity and individual comfort, with the goal of understanding the design space for effective, yet palatable, sensors for the workplace. In the context of a common space contextualization task, we survey and interview individuals about their comfort with three common sensing modalities: video, audio, and passive infrared. This allows us to explore the extent to which discomfort with sensor platforms is a function of detected states or sensed data. Our findings uncover interesting interplays between content, context, fidelity, history, and privacy. This, in turn, leads to design recommendations regarding how to increase comfort with sensing technologies by revisiting the mechanisms by which user preferences and policies are enforced in situations where the infrastructure itself is not trusted.

Abstract

Massive Open Online Course (MOOC) platforms have scaled online education to unprecedented enrollments, but remain limited by their rigid, predetermined curricula. Increasingly, professionals consume this content to augment or update specific skills rather than complete degree or certification programs. To better address the needs of this emergent user population, we describe a visual recommender system called MOOCex. The system recommends lecture videos {\em across} multiple courses and content platforms to provide a choice of perspectives on topics. The recommendation engine considers both video content and sequential inter-topic relationships mined from course syllabi. Furthermore, it allows for interactive visual exploration of the semantic space of recommendations within a learner's current context.

Abstract

An enormous amount of conversation occurs online every day, including on chat platforms where multiple conversations may take place concurrently.
Interleaved conversations lead to difficulties in not only following discussions but also retrieving relevant information from simultaneous messages.
Conversation disentanglement aims to separate overlapping messages into detached conversations.
In this paper, we propose to leverage representation learning for conversation disentanglement. A Siamese Hierarchical Convolutional Neural Network (SHCNN), which integrates local and more global representations of a message, is first presented to estimate the conversation-level similarity between closely posted messages. With the estimated similarity scores, our algorithm for Conversation Identification by SImilarity Ranking (CISIR) then derives conversations based on high-confidence message pairs and pairwise redundancy.
Experiments were conducted with four publicly available datasets of conversations from Reddit and IRC channels. The experimental results show that our approach significantly outperforms comparative baselines in both pairwise similarity estimation and conversation disentanglement.

Abstract

Conversational agents stand to play an important role in supporting behavior change and well-being in many domains. With users able to interact with conversational agents through both text and voice, understanding how designing for these channels supports behavior change is important. To begin answering this question, we designed a conversational agent for the workplace that supports workers’ activity journaling and self-learning through reflection. Our agent, named Robota, combines chat-based communication as a Slack Bot and voice interaction through a personal device using a custom Amazon Alexa
Skill. Through a 3-week controlled deployment, we examine how voice-based and chat-based interaction affect workers’ reflection and support self-learning. We demonstrate that, while many current technical limitations exist, adding dedicated mobile voice interaction separate from the already busy chat modality may further enable users to step back and reflect on their work. We conclude with discussion of the implications of our findings to design of workplace self-tracking systems specifically and to behavior-change systems in general.

Abstract

Convolutional Neural Networks (CNN) have successfully been utilized for localization using a single monocular image [1]. Most of the work to date has either focused on reducing the dimensionality of data for better learning of parameters during training or on developing different variations of CNN models to improve pose estimation. Many of the best performing works solely consider the content in a single image, while the context from historical images is ignored. In this paper, we propose a combined CNN-LSTM which is capable of incorporating contextual information from historical images to better estimate the current pose. Experimental results achieved using a dataset collected in an indoor office space improved the overall system results to 0.8 m & 2.5° at the third quartile of the cumulative distribution as compared with 1.5 m & 3.0° achieved by PoseNet [1]. Furthermore, we demonstrate how the temporal information exploited by the CNN-LSTM model assists in localizing the robot in situations where image content does not have sufficient features.

Abstract

In this paper, we propose a novel solution to optimize the deployment of (RF) beacons for the purpose of indoor localization. We propose a system that optimizes both the number of beacons and their placement in a given environment. We propose a novel cost-function, called CovBSM, that allows to simultaneously optimize the 3-coverage while maximizing the beacon spreading. Using this cost function, we propose a framework that maximize both the number of beacons and their placement in a given environment. The proposed solution accounts for the indoor infrastructure and its influence on the (RF) signal propagation by embedding a realistic simulator into the optimization process.

Abstract

Massive Open Online Course (MOOC) platforms have scaled online education to unprecedented enrollments, but remain limited by their rigid, predetermined curricula. This paper presents MOOCex, a technique that can offer a more flexible learning experience for MOOCs. MOOCex can recommend lecture videos across different courses with multiple perspectives, and considers both the video content and also sequential inter-topic relationships mined from course syllabi. MOOCex is also equipped with interactive visualization allowing learners to explore the semantic space of recommendations within their current learning context. The results of comparisons to traditional methods, including content-based recommendation and ranked list representation, indicate the effectiveness of MOOCex. Further, feedback from MOOC learners and instructors suggests that MOOCex enhances both MOOC-based learning and teaching.

Abstract

Understanding team communication and collaboration patterns is critical for improving work efficiency in organizations. This paper presents an interactive visualization system, T-Cal, that supports the analysis of conversation data from modern team messaging platforms (e.g., Slack). T-Cal employs a user-familiar visual interface, a calendar, to enable seamless multi-scale browsing of data from different perspectives. T-Cal also incorporates a number of analytical techniques for disentangling interleaving conversations, extracting keywords, and estimating sentiment. The design of T-Cal is based on an iterative user-centered design process including field studies, requirements gathering, initial prototypes demonstration, and evaluation with domain users. The resulting two case studies indicate the effectiveness and usefulness of T-Cal in real-world applications, including student group chats during a MOOC and daily conversations within an industry research lab.

Abstract

This paper describes the development of a multi-sensory clubbing experience which was deployed during two a two-day event within the context of the Amsterdam Dance Event in October 2016 in Amsterdam. We present how the entire experience was developed end-to-end and deployed at the event through the collaboration of several project partners from industries such as art and design, music, food, technology and research. Central to the system are smart textiles, namely wristbands equipped with Bluetooth LE sensors which were used to sense people attending the dance event. We describe the components of the system, the development process, collaboration between the involved entities and the event itself. To conclude the paper, we highlight insights gained from conducting a real world research deployment across many collaborators and stakeholders.

Abstract

Effective communication of activities and progress in the workplace is crucial for the success of many modern organizations. In this paper, we extend current research on workplace communication and uncover opportunities for technology to support effective work activity reporting. We report on three studies: With a survey of 68 knowledge workers followed by 14 in-depth interviews, we investigated the perceived benefits of different types of progress reports and an array of challenges at three stages: Collection, Composition, and Delivery. We show an important interplay between written and face-to-face reporting, and highlight the importance of tailoring a report to its audience. We then present results from an analysis of 722 reports composed by 361 U.S.-based knowledge workers, looking at the influence of the audience on a report’s language. We conclude by discussing opportunities for future technologies to assist both employees and managers in collecting, interpreting, and reporting progress in the workplace.

Abstract

Activity recognition is a core component of many intelligent and context-aware systems. In this paper, we present a solution for discreetly and unobtrusively recognizing common work activities above a work surface without using cameras. We demonstrate our approach, which utilizes an RF-radar sensor mounted under the work surface, in two work domains; recognizing work activities at a convenience-store counter (useful for post-hoc analytics) and recognizing common office deskwork activities (useful for real-time applications). We classify seven clerk activities with 94.9% accuracy using data collected in a lab environment, and recognize six common deskwork activities collected in real offices with 95.3% accuracy. We show that using multiple projections of RF signal leads to improved recognition accuracy. Finally, we show how smartwatches worn by users can be used to attribute an activity, recognized with the RF sensor, to a particular user in multi-user scenarios. We believe our solution can mitigate some of users’ privacy concerns associated with cameras and is useful for a wide range of intelligent systems.

Abstract

This paper examines content-based recommendation in domains exhibiting sequential topical structure. An example is educational video, including Massive Open Online Courses (MOOCs) in which knowledge builds within and across courses. Conventional content-based or collaborative filtering recommendation methods do not exploit courses' sequential nature. We describe a system for video recommendation that combines topic-based video representation with sequential pattern mining of inter-topic relationships. Unsupervised topic modeling provides a scalable and domain-independent representation. We mine inter-topic relationships from manually constructed syllabi that instructors provide to guide students through their courses. This approach also allows the inclusion of multi-video sequences among the recommendation results.
Integrating the resulting sequential information with content-level similarity provides relevant as well as diversified recommendations. Quantitative evaluation indicates that the proposed system, \textit{SeqSense}, recommends fewer redundant videos than baseline methods, and instead emphasizes results consistent with mined topic transitions.

Abstract

Traditional summarization initiatives have been focused on specific types of documents such as articles, reviews, videos, image feeds, or tweets, a practice which may result in pigeonholing the summarization task in the surrounding of modern, content-rich multimedia collections. Consequently, much of the research to date has revolved around mostly toy problems in narrow domains and working on single-source media types. We argue that summarization and story generation systems need to refocus the problem space in order to meet the information needs in the age of user-generated content in different formats and languages. Here we create a framework for flexible multimedia storytelling. Narratives, stories, and summaries carry a set of challenges in big data and dynamic multi-source media that give rise to new research in spatial-temporal representation, viewpoint generation, and explanation.

Abstract

Tutorials are one of the most fundamental means of conveying knowledge. Ideally when the task involves physical or digital objects, tutorials not only describe each step with text or via audio narration but show it as well using photos or animation. In most cases, online tutorial authors capture media from handheld mobile devices to compose these documents, but increasingly they use wearable devices as well. In this work, we explore the full life-cycle of online tutorial creation and viewing using head-mounted capture and displays. We developed a media-capture tool for Google Glass that requires minimal attention to the capture device and instead allows the author to focus on creating the tutorial's content rather than its capture. The capture tool is coupled with web-based authoring tools for creating annotatable videos and multimedia documents. In a study comparing standalone (camera on tripod) versus wearable capture (Google Glass) as well as two types of multimedia representation for authoring tutorials (video-based or document-based), we show that tutorial authors have a preference for wearable capture devices, especially when recording activities involving larger objects in non-desktop environments. Authors preferred document-based multimedia tutorials because they are more straightforward to compose and the step-based structure translates more directly to explaining a procedure. In addition, we explored using head-mounted displays (Google Glass) for accessing tutorials in comparison to lightweight computing devices such as tablets. Our study included tutorials recorded with the same capture methods as in our access study. We found that although authors preferred head-mounted capture, tutorial consumers preferred video recorded by a camera on tripod that provides a more stable image of the workspace. Head-mounted displays are good for glanceable information, however video demands more attention and our participants made more errors using Glass than when using a tablet, which was easier to ignore. Our findings point out several design implications for online tutorial authoring and access methods.

Abstract

Advances in small and low power electronics have created new opportunities for the Internet of Things (IoT), leading to an explosion of physical objects being connected to the Internet. However, there still lacks an indoor localization solution that can answer the needs of various location-based IoT applications with desired simplicity, robustness, accuracy, and responsiveness. We introduce Foglight, a visible light enabled indoor localization system for IoT devices that relies on unique spatial encoding produced when mechanical mirrors inside a projector are flipped based on gray-coded binary images. Foglight employs simple off-the-shelf light sensors that can be easily coupled with existing IoT devices - such as thermometers, gas meters, or light switches - making their location discoverable. Our sensor unit is computation efficient; it can perform high-accuracy localization with minimum signal processing overhead, allowing any low-power IoT device on which it rests to be able to locate itself. Additionally, results from our evaluation reveal that Foglight can locate a target device with an average accuracy of 1.7 millimeters and average refresh rate of 84 Hz with minimal latency, 31.46 milliseconds on WiFi and 23.2 milliseconds on serial communication. Two example applications are developed to demonstrate possible scenarios as proof of concept. We also discuss limitations, how they could be overcome, and propose next steps.

Abstract

We present a system for capturing ink strokes written with ordinary pen and paper using a fast camera with a frame rate comparable to a stylus digitizer. From the video frames, ink strokes are extracted and used as input to an online handwriting recognition engine. A key component in our system is a pen up/down detection model for detecting the contact of the pen-tip with the paper in the video frames. The proposed model consists of feature representation with convolutional neural networks and classification with a recurrent neural network. We also use a high speed tracker with kernelized correlation filters to track the pen-tip. For training and evaluation, we collected labeled video data of users writing English and Japanese phrases from public datasets, and we report on character accuracy scores for different frame rates in the two languages.

Abstract

Video telehealth is growing to allow more clinicians to see patients from afar. As a result, clinicians, typically trained for in-person visits, must learn to communicate both health information and non-verbal affective signals to patients through a digital medium. We introduce a system called ReflectLive that senses and provides real-time feedback about non-verbal communication behaviors to clinicians so they can improve their communication behaviors. A user evaluation with 10 clinicians showed that the real-time feedback helped clinicians maintain better eye contact with patients and was not overly distracting. Clinicians reported being more aware of their non-verbal communication behaviors and reacted positively to summaries of their conversational metrics, motivating them to want to improve. Using ReflectLive as a probe, we also discuss the benefits and concerns around automatically quantifying the “soft skills” and complexities of clinician-patient communication, the controllability of behaviors, and the design considerations for how to present real-time and summative feedback to clinicians.

Abstract

Humans are complex and their behaviors follow complex multimodal patterns, however to solve many social computing problems one often looks at complexity in large-scale yet single point data sources or methodologies. While single data/single method techniques, fueled by large scale data, enjoyed some success, it is not without fault. Often with one type of data and method, all the other aspects of human behavior are overlooked, discarded, or, worse, misrepresented. We identify this as two succinct problems. First, social computing problems that cannot be solved using a single data source and need intelligence from multiple modals and, second, social behavior that cannot be fully understood using only one form of methodology. Throughout this talk, we discuss these problems and their implications, illustrate examples, and propose new directives to properly approach in the social computing research in today’s age.

Abstract

Discovering and analyzing biclusters, i.e., two sets of related entities with close relationships, is a critical task in many real-world applications, such as exploring entity co-occurrences in intelligence analysis, and studying gene expression in bio-informatics. While the output of biclustering techniques can offer some initial low-level insights, visual approaches are required on top of that due to the algorithmic output complexity.This paper proposes a visualization technique, called BiDots, that allows analysts to interactively explore biclusters over multiple domains. BiDots overcomes several limitations of existing bicluster visualizations by encoding biclusters in a more compact and cluster-driven manner. A set of handy interactions is incorporated to support flexible analysis of biclustering results. More importantly, BiDots addresses the cases of weighted biclusters, which has been underexploited in the literature. The design of BiDots is grounded by a set of analytical tasks derived from previous work. We demonstrate its usefulness and effectiveness for exploring computed biclusters with an investigative document analysis task, in which suspicious people and activities are identified from a text corpus.

Abstract

During asynchronous collaborative analysis, handoff of partial findings is challenging because externalizations produced by analysts may not adequately communicate their investigative process. To address this challenge, we developed techniques to automatically capture and help encode tacit aspects of the investigative process based on an analyst’s interactions, and streamline explicit authoring of handoff annotations. We designed our techniques to mediate awareness of analysis coverage, support explicit communication of progress and uncertainty with annotation, and implicit communication through playback of investigation histories. To evaluate our techniques, we developed an interactive visual analysis system, KTGraph, that supports an asynchronous investigative document analysis task. We conducted a two-phase user study to characterize a set of handoff strategies and to compare investigative performance with and without our techniques. The results suggest that our techniques promote the use of more effective handoff strategies, help increase an awareness of prior investigative process and insights, as well as improve final investigative outcomes.

Abstract

Whether and how does the structure of family trees differ by ancestral traits over generations? This is a fundamental question regarding the structural heterogeneity of family trees for the multi-generational transmission research. However, previous work mostly focuses on parent-child scenarios due to the lack of proper tools to handle the complexity of extending the research to multi-generational processes. Through an iterative design study with social scientists and historians, we develop TreeEvo that assists users to generate and test empirical hypotheses for multi-generational research. TreeEvo summarizes and organizes family trees by structural features in a dynamic manner based on a traditional Sankey diagram. A pixel-based technique is further proposed to compactly encode trees with complex structures in each Sankey Node. Detailed information of trees is accessible through a space-efficient visualization with semantic zooming. Moreover, TreeEvo embeds Multinomial Logit Model (MLM) to examine statistical associations between tree structure and ancestral traits. We demonstrate the effectiveness and usefulness of TreeEvo through an in-depth case-study with domain experts using a real-world dataset (containing 54,128 family trees of 126,196 individuals).

Abstract

For tourists, interactions with digital public displays often depend on specific technologies that users may not be familiar with (QR codes, NFC, Bluetooth); may not have access to because of networking issues (SMS), may lack a required app (QR codes), or device technology (NFC); may not want to use because of time constraints (WiFi, Bluetooth); or may not want to use because they are worried about sharing their data with a third-party service (text, WiFi). In this demonstration, we introduce ItineraryScanner, a system that allows users to
seamlessly share content with a public travel kiosk system.

Abstract

Video summarization and video captioning are considered two separate tasks in existing studies. For longer videos, automatically identifying the important parts of video content and annotating them with captions will enable a richer and more concise condensation of the video. We propose a general neural network architecture that jointly considers two supervisory signals (i.e., an image-based video summary and text-based video captions) in the training phase and generates both a video summary and corresponding captions for a given video in the test phase. Our main idea is that the summary signals can help a video captioning model learn to focus on important frames. On the other hand, caption signals can help a video summarization model to learn better semantic representations. Jointly modeling both the video summarization and the video captioning tasks offers a novel end-to-end solution that generates a captioned video summary enabling users to index and navigate through the highlights in a video. Moreover, our experiments show the joint model can achieve better performance than state-of- the-art approaches in both individual tasks.

Abstract

In this paper, we describe DocHandles, a novel system that allows users to link to specific document parts in their chat applications. As users type a message, they can invoke the tool by referring to a specific part of a document, e.g., “@fig1 needs revision”. By combining text parsing and document layout analysis, DocHandles can find and present all the figures “1” inside previously shared documents, allowing users to explicitly link to the relevant “document handle”. Documents become first-class citizens inside the conversation stream where users can seamlessly integrate documents in their text-centric messaging application.

Abstract

It is increasingly possible to use cameras and sensors to detect and analyze human appearance for the purposes of personalizing user experiences. Such systems are already deployed in some public places to personalize advertisements and recommend items. However, since these technologies are not yet widespread, we do not have a good sense of the perceived benefits and drawbacks of public display systems that use face detection as an input for personalized recommendations. We conducted a user study with a system that inferred a user’s gender and age from a facial detection and analysis algorithm and used this to present recommendations in two scenarios (finding stores to visit in a mall and finding a pair of sunglasses to buy). This work provides an initial step towards understanding user reactions to a new and emerging form of implicit recommendation based on physical appearance.

Abstract

The availability of mobile access has shifted social media use. With that phenomenon, what users shared on social media and where they visited is naturally an excellent resource to learn their visiting behavior. Knowing visit behaviors would help market survey and customer relationship management, e.g., sending customers coupons of the businesses that they visit frequently. Most prior studies leverage meta-data e.g., check- in locations to profile visiting behavior but neglect important information from user-contributed content, e.g., images. This work addresses a novel use of image content for predicting the user visit behavior, i.e., the frequent and regular business venue categories that the content owner would visit. To collect training data, we propose a strategy to use geo-metadata associated with images for deriving the labels of an image owner’s visit behavior. Moreover, we model a user’s sequential images by using an end-to-end learning framework to reduce the optimization loss. That helps improve the prediction accuracy against the baseline as demonstrated in our experiments. The prediction is completely based on image content that is more available in social media than geo-metadata, and thus allows coverage in profiling a wider set of users.

Abstract

Video conferencing is widely used to help deliver
educational presentations, such as lectures or informational
webinars, to a distributed audience. While individuals in a
dyadic conversation may be able to use webcam streams to assess the engagement level of their interlocutor with some ease, as the size of the audience in a video conference setting increases, it becomes increasingly difficult to interpret how engaged the overall group may be. In this work, we use a mixed-methods approach to understand how presenters and attendees of online presentations use available cues to perceive and interpret audience behavior (such as how engaged the group is). Our results suggest that while webcams are seen as useful by presenters to increase audience visibility and encourage attention, audience members do not uniformly benefit from seeing others’ webcams; other interface cues such as chat may be more useful and informative engagement indicators for both parties. We conclude with design recommendations for future systems to improve what is sensed and presented.

Abstract

In this paper, we propose a real-time classification scheme to cope with noisy Radio Signal Strength Indicator (RSSI) measurements utilized in indoor positioning systems. RSSI values are often converted to distances for position estimation. However due to multipathing and shadowing effects, finding a unique sensor model using both parametric and nonparametric methods is highly challenging. We learn decision regions using the Gaussian Processes classification to accept measurements that are consistent with the operating sensor model. The proposed approach can perform online, does not rely on a particular sensor model or parameters, and is robust to sensor failures. The experimental results achieved using hardware show that available positioning algorithms can benefit from incorporating the classifier into their measurement model as a meta-sensor modeling technique.

Abstract

Observe at a person pointing out and describing something. Where is that person looking? Chances are good that this person also looks at what she is talking about and pointing at. Gaze is naturally coordinated with our speech and hand movements. By utilizing this tendency, we can create a natural interaction with computing devices and environments. In this chapter, we will first briefly discuss some basic properties of the gaze signal we can get from eye trackers, followed by a review of a multimodal system utilizing the gaze signal as one input modality. In Multimodal Gaze Interaction, data from eye trackers is used as an active input mode where for instance gaze is used as an alternative, or complimentary, pointing modality along with other input modalities. Using gaze as an active or explicit input method is challenging for several reasons. One of them being that eyes are primarily used for perceiving our environment, so knowing when a person selects an item with gaze versus just looking around is an issue. Researchers have tried to solve this by combining gaze with various input methods, such as manual pointing, speech, touch, etc. However, gaze information can also be used in interactive systems, for other purposes than explicit pointing since a user's gaze is a good indication of the user's attention. In passive gaze interaction, the gaze is not used as the primary input method, but as a supporting input method. In these kinds of systems, gaze is mainly used for inferring and reasoning about the user's cognitive state or activities in a way that can support the interaction. These kinds of multimodal systems often combine gaze with a multitude of input modalities.
In this chapter we focus on interactive systems, exploring the design space for gaze-informed multimodal interaction spanning from gaze as active input mode to passive and if the usage scenario is stationary (at e.g. a desk) or mobile. There are a number of studies aimed at describing, detecting or modeling specific behaviors or cognitive states. We will touch on some of these works since they can guide us in how to build gaze-informed multimodal interaction.

Abstract

Work breaks can play an important role in the mental and physical well-being of workers and contribute positively to productivity. In this paper we explore the use of activity-, physiological-, and indoor-location sensing to promote mobility during work-breaks. While the popularity of devices and applications to promote physical activity is growing, prior research highlights important constraints when designing for the workplace. With these constraints in mind, we developed BreakSense, a mobile application that uses a Bluetooth beacon infrastructure, a smartphone and a smartwatch to encourage mobility during breaks with a game-like design. We discuss constraints imposed by design for work and the workplace, and highlight challenges associated with the use of noisy sensors and methods to overcome them. We then describe a short deployment of BreakSense within our lab that examined bound vs. unbound augmented breaks and how they affect users’ sense of completion and readiness to work.

Abstract

Users often use social media to share their interest in products. We propose to identify purchase stages from Twitter data following the AIDA model (Awareness, Interest, Desire, Action). In particular, we define a task of classifying the purchase stage of each tweet in a user's tweet sequence. We introduce RCRNN, a Ranking Convolutional Recurrent Neural Network which computes tweet representations using convolution over word embeddings and models a tweet sequence with gated recurrent units. Also, we consider various methods to cope with the imbalanced label distribution in our data and show that a ranking layer outperforms class weights.

Abstract

We present Lift, a visible light-enabled finger tracking and object localization technique that allows users to perform freestyle multi-touch gestures on any object’s surface in an everyday environment. By projecting encoded visible patterns onto an object’s surface (e.g. paper, display, or table), and localizing the user’s fingers with light sensors, Lift offers users a richer interactive space than the device’s existing interfaces. Additionally, everyday objects can be augmented by attaching sensor units onto their surface to accept multi-touch gesture input. We also present two applications as a proof of concept. Finally, results from our experiments indicate that Lift can localize ten fingers simultaneously with accuracy of 0.9 mm and 1.8 mm on two axes respectively and an average refresh rate of 84 Hz with 16.7ms delay on WiFi and 12ms delay on serial, making gesture recognition on noninstrumented objects possible.

Abstract

This is a summary of our participation in the TRECVID 2016 video hyperlinking task (LNK). We submitted four runs in total. A baseline system combined on established vectorspace text indexing and cosine similarity. Our other runs explored the use of distributed word representations in combination with fine-grained inter-segment text similarity measures.

Abstract

Videos recorded with current mobile devices are increasingly geotagged at fine granularity and used in various location based applications and services. However, raw sensor data collected is often noisy, resulting in subsequent inaccurate geospatial analysis. In this study, we focus on the challenging correction of compass readings and present an automatic approach to reduce these metadata errors. Given the small geo-distance between consecutive video frames, image-based localization does not work due to the high ambiguity in the depth reconstruction of the scene. As an alternative, we collect geographic context from OpenStreetMap and estimate the absolute viewing direction by comparing the image scene to world projections obtained with different external camera parameters. To design a comprehensive model, we further incorporate smooth approximation and feature-based rotation estimation when formulating the error terms. Experimental results show that our proposed pyramid-based method outperforms its competitors and reduces orientation errors by an average of 58.8%. Hence, for downstream applications, improved results can be obtained with these more accurate geo-metadata. To illustrate, we present the performance gain in landmark retrieval and tag suggestion by utilizing the accuracy-enhanced geo-metadata.

Abstract

Accurate map matching has been a fundamental but challenging problem that has drawn great research attention in recent years. It aims to reduce the uncertainty in a trajectory by matching the GPS points to the road network on a digital map. Most existing work has focused on estimating the likelihood of a candidate path based on the GPS observations, while neglecting to model the probability of a route choice from the perspective of drivers. Here we propose a novel feature-based map matching algorithm that estimates the cost of a candidate path based on both GPS observations and human factors. To take human factors into consideration is very important especially when dealing with low sampling rate data where most of the movement details are lost. Additionally, we simultaneously analyze a subsequence of coherent GPS points by utilizing a new segment-based probabilistic map matching strategy, which is less susceptible to the noisiness of the positioning data. We have evaluated the proposed approach on a public large-scale GPS dataset, which consists of 100 trajectories distributed all over the world. The experimental results show that our method is robust to sparse data with large sampling intervals (e.g., 60 s to 300 s) and challenging track features (e.g., u-turns and loops). Compared with two state-of-the-art map matching algorithms, our method substantially reduces the route mismatch error by 6.4% to 32.3% and obtains the best map matching results in all the different combinations of sampling rates and challenging features.

Abstract

Improvements in sensor and wireless network enable accurate, automated, instant determination and dissemination of a user's or
objects position. The new enabler of location-based services (LBSs) apart from the current ubiquitous networking infrastructure is
the enrichment of the different systems with semantics information, such as time, location, individual capability, preference and
more. Such semantically enriched system-modeling aims at developing applications with enhanced functionality and advanced
reasoning capabilities. These systems are able to deliver more personalized services to users by domain knowledge with advanced
reasoning mechanisms, and provide solutions to problems that were otherwise infeasible. This approach also takes user's preference
and place property into consideration that can be utilized to achieve a comprehensive range of personalized services, such as
advertising, recommendations, or polling. This paper provides an overview of indoor localization technologies, popular models for
extracting semantics from location data, approaches for associating semantic information and location data, and applications that
may be enabled with location semantics. To make the presentation easy to understand, we will use a museum scenario to explain
pros and cons of different technologies and models. More specifically, we will first explore users' needs in a museum scenario.
Based on these needs, we will then discuss advantages and disadvantages of using different localization technologies to meet these
needs. From these discussions, we can highlight gaps between real application requirements and existing technologies, and point
out promising localization research directions. By identifying gaps between various models and real application requirements,
we can draw a road map for future location semantics research.