Humans are feeling a growing need to engage themselves in their computational world as much as they engage themselves in the real physical world. Following this need of always staying connected, computers are becoming more and more wearable. The two primary senses that involve human computer interaction being sight and sound, we need an interface that can make this every day interaction easy, intuitive and possible. We introduce an experience capturing system known as an Eyetap. Eyetap devices cause the eye to, in effect, function as if it were both a camera and display, by mapping an effective camera and display inside the eye. This paper will discuss the evolution of the technology over the last 30 years, together with some new designs. We also discuss some of the various applications that immediately arise from the use of such technology and the various new practices that are made possible with Eyetap technology. The paper is divided into the following two sections. Capture/sensors and experiential sampling. Applications, Security and privacy

In this paper, we present continuous capture of our life log with various sensors and additional data and propose effective retrieval using context and content based on them. Our life log system contains video, audio, acceleration sensor, gyro, GPS, annotations, documents, webs, and emails. In our previous studies, we showed our retrieval methodology which mainly depends on context information from the sensor data. In this paper, we extend our methodology and add additional functions. They are (1) spatio-temporal sampling for extraction of key frames for summarization (2) conversation scene detection. Regarding the spatio-temporal sampling, key frames for the summarization are extracted using time and location data (GPS). Because our life log captures dense location data, we can also make use of derivatives of location data, that is, speed and acceleration of the movement of the person. The summarized key frames are made using them. We also introduce content analysis, that is, the conversation scene detection. In our previous work, we have investigated context based retrieval, which differs from the majority of the works in the image/video retrieval focusing on content based retrieval. In this paper, we introduced visual and audio data content analysis for conversation scene detection. The detection of conversation scene will be very important tags for our life log data retrieval. We describe our present system and additional functions, and preliminary results for the additional functions.Slides (PDF 0.5 MB)

A Layered Interpretation of Human Interaction Captured by Ubiquitous Sensors

We are developing a machine-readable interaction corpus, a collection of human interaction data captured by multiple sensors, in order to use it in recording various episodes in our daily life. To develop such a corpus, we have prototyped ubiquitous/wearable sensor systems that collaboratively capture human interactions from multiple points of view in a poster exhibition site. An infrared ID system, which recognizes the existence of persons or objects, enables us to estimate the user’s state of gazing at a particular person/object or of staying at a particular place. A throat microphone, which detects the volume of the user’s voice, tells us whether he or she is making an utterance. The purpose of this study is to interpret human interaction patterns automatically from the data of various sensors and to give the interaction corpus machine-readable indices to make it more useful. In this paper, we propose a layered model for these interpretations based on a bottom-up approach to using sensors. In this model, interpretations of human interactions are hierarchically abstracted so that each layer has unique semantic/syntactic information represented by machine-readable indices. This layered model enables us to use various sensors and to refer to various indices according to the purpose. Furthermore, we assume that we can apply this model to multiple domains due to its modeling hierarchically. These indices enable us to use various sources as images and audio more efficiently, for example, to search for significant scenes. Accordingly, we introduce various applications that adaptively utilize such interaction indices at an exhibition site in order to demonstrate their effectiveness in a corpus. Finally, we introduce our further attempts to capture human interactions in another domain, a meeting situation, with the same system in order to assess its versatility. Paper (PDF 1.2 MB)Slides (PDF 6 MB)

10:30-11:00

Coffee break & Demos

11:00-12:00

Session 2

Minimal-Impact Audio-Based Personal Archives

Daniel P.W. Ellis & Keansub Lee

Collecting and storing continuous personal archives has become cheap and easy, but we are still far from creating a useful, ubiquitous memory aid. We view the inconvenience to the user of being `instrumented’ as one of the key barriers to the broader development and adoption of these technologies. Audio-only recordings, however, can have minimal impact, requiring only that a device the size and weight of a cellphone be carried somewhere on the person. We have conducted some small-scale experiments on collecting continuous personal recordings of this kind, and investigating how they can be automatically analyzed and indexed, visualized, and correlated with other minimal-impact, opportunistic data feeds (such as online calendars and digital photo collections). We describe our unsupervised segmentation and clustering experiments in which we can achieve good agreement with hand-marked environment/situation labels. We also discuss some of the broader issues raised by this kind of work including privacy concerns, and describe our future plans to address these and other questions. Paper (PDF) Slides (PDF)

Passive Capture and Ensuing Issues for a Personal Lifetime Store

Jim Gemmell, Lyndsay Williams, Ken Wood, Gordon Bell and Roger Lueder

Passive capture lets people record their experiences without having to operate recording equipment, and without even having to give recording conscious thought. The advantages are increased capture, and improved participation in the event itself. However, passive capture also presents many new challenges. One key challenge is how to deal with the increased volume of media for retrieval, browsing, and organizing. This paper describes the SenseCam device, which combines a camera with a number of sensors in a pendant worn around the neck. Data from SenseCam is uploaded into a MyLifeBits repository, where a number of features, but especially correlation and relationships, are used to manage the data. Paper (PDF 1 MB)PowerPoint Slides (7.5 MB)

One of the greatest challenges in enterprises today is the lack of dynamic and ongoing information about individuals activities, interests, and expertise. Availability of such personal chronicles can provide rich benefits at both an individual and enterprise level. For example, personal chronicles can help individuals to far more effectively retrieve and review their activities and interactions, while at an enterprise level they can be data-mined to identify groups of common and complementary interests and skills, or to identify implicit work processes that are commonplace in every enterprise. Todays existing tools are very limited in their support for dynamic capture of ongoing activities, in the organization and presentation of captured information, and in supporting rich annotation, search, retrieval, and publication of this information. In this paper, we propose a set of Personal Chronicling Tools (PCT) to support enterprise knowledge workers in digital event archiving and collaboration-oriented publishing. PCT is composed of four primary tools with the following capabilities: (1) event monitoring, (2) interactive annotation, (3) browse/search, and (4) edit/publish. All are designed to exploit existing enterprise infrastructure, storing captured raw data and metadata in secure databases. The first tool is a group of event monitors. These run on user client devices and capture user events such as emails, web pages browsed, instant messaging sessions, and documents edited. Monitors for new event classes are easily added as plug-ins through an XML interface. The second tool, the event annotator, enables context-sensitive user tagging and book marking of interesting moments. The third is an event browser which extends corporate email tools, providing semantic search (by embedding WordNet as a common dictionary) and the ability to follow threads of many kinds. Finally, a publishing tool facilitates the publication of relevant events with a fraction of the effort required to maintain a manual chronicle such as a weblog. This paper presents the overall system architecture, a prototype implementation, and preliminary results from field studies.

Uniscript: a model for persistent and incremental knowledge storage

Adorjan Kiss, Joel Quinqueton

We present in this paper a model of personal knowledge representation for lifetime storage. In the model we separate the knowledge layer from the resource layer. The knowledge layer consists of a network of atomic knowledge units situated in space and time. Resources are data packages (bit sequences) that can be rendered by some device into any human-perceivable form. The two parts complement each other: the knowledge network can be seen as annotations of the resource base (multimedia store) while resources can serve as means for the interpretation of knowledge units as well as a way to index and access them. For the knowledge network we propose a simple formalism that we consider could support the emergence of a language capable to describe increasingly complex situations of the real world and, by time, to represent any information that is expressible by natural language.

We advocate a new approach to meeting video retrieval based on the use of memory cues. First we present a new survey involving 519 people in which we investigate the types of items people use to review meeting contents (e.g., minutes, video, etc.). Then we present a novel memory study involving 15 subjects in which we investigate what people remember about past meetings (e.g., seating position, etc). Based on these studies and related research we propose a novel framework for meeting video retrieval based on memory cues. Our proposed system graphically represents important memory retrieval cues such as room layout, participant.s faces and sitting positions, etc.. Queries are formulated dynamically: as the user graphically manipulates the cues, the query results are shown. Our system (1) helps users easily express the cues they recall about a particular meeting; (2) helps users remember new cues for meeting video retrieval. Finally, we present our approach to automatic indexing of meeting videos. Paper (PDF 0.5 MB)

Total Recall: Are Privacy Changes Inevitable? a position paper

William Cheng, Leana Golubchik, David Kay

Total Recall is a system that records a personal version of the world using personal sensors such as a microphone array in a pair of glasses or a camera in a necklace. There are many applications of Total Recall, such as in health care, education, and so on, which can significantly improve people’s quality of life. However, data recorded by such a system may be also used by the legal system. Hence, pervasive use of such a system will likely change our social structure as potentially there may be no question in the future as to who said what or who did what. It is natural then that privacy advocates might consider such technology dangerous because such data can be misused or abused by law enforcement. In this paper, we discuss privacy concerns in the context of systems like Total Recall and propose a solution that may alleviate some of these concerns. We discuss the ramification of this solution and its possible implementations.

We present a system that retrieves the voice part of human communications captured by our collaborative experience capturing system. For segmenting, interpreting, and retrieving past conversation scenes from a huge amount of captured data the system focusses on the non-verbal aspects, i.e. the contextual informations captured by ubiquitous sensors, rather than the verbal (semantic) aspects of the data. The retrieved communications are presented to other persons being in similar situations as the communicators. This experience sharing enables people to gain more information about their situation or surroundings. The system’s current domain is a poster exhibition at an academic conference where the system provides a visitor with additional informations about the exhibited posters.Paper (PDF 0.3 MB)

Our physical experiences are best represented by body movements. Many limitations of existing systems/devices, however, prevent their use in archival of daily experiences. This paper proposes an integrated system composed of PWS (Posture Web Server) and PHA (Posture History Archiver); The PWS has a palm-size controller and 15 light-weight tilt sensor devices, newly developed by us. The feature of our tilt device lies in measurement of 360 degrees inclinations in two directions. The PWS is worn by a user, and always monitors his/her body posture. It acts as a posture web server, that is, it sends his/her current postural data upon request via a wireless network. The PHA running on a PC sends requests to PWS periodically via a network, and then archives a time series of his/her postures called posture history. The whole/part of the history can be visualized depending on his/her preferences. In this paper, system design issues, development of tilt sensor devices, implementation, and our experimental results are described.

eyeBlog is an automatic personal video recording system. It consists of ECSGlasses [XX], a pair of glasses augmented with a wireless eye-contact and glyph sensing camera, and a web application that visualizes the video from the ECSGlasses camera as chronologically delineated blog entries. The blog format allows for easy annotation, grading, cataloging and searching of video segments by the wearer or anyone else with Internet access. eyeBlog reduces the editing effort of video bloggers by recording video only when something of interest is registered by the camera. Interest is determined by a combination of independent methods. For example, recording can be triggered upon detection of eye contact towards the wearer of the glasses, allowing all face-to-face interactions to be recorded. Recording can also be triggered by the detection of image patterns such as glyphs in the frame of the camera. This allows the wearer to record their interactions with any object that has an associated marker. Finally, by pressing a button the user can initiate recording manually.Paper (PDF 2.6 MB)

Deja View Camwear Model 100 is the first in a family of wearable camcorders designed to free the user from being shackled to his viewfinder. The Model 100 is designed to ensure that the user never misses that important tidbit. While the initial use is for active lifestyles, we are exploring its use in Security, Military, Training, Automotive and sundry other vertical markets.

In this paper, we describe the Quindi Meeting Companion, a personal software tool for documenting content-rich meetings. We examine the principal motivations for the system, key design decisions, and new practices enabled by the technology.Paper (PDF 0.1 MB)

Only by making computing intimate to the body can products begin to know our states of mind, our contexts, our states of health, etc. and respond (or have other aspects of the world respond) in intelligent ways. Sympathetic products, driven by computers worn on the body, are coming and the industry will grow up with wearable body monitoring at its core. BodyMedia, Inc. has been building toward this vision since 1999 and today has a commercially available, clinically tested, consumer-acceptable, environmentally hardened body monitoring platform. This platform is available today or will soon be available in healthcare markets as well as tangential areas including safety, security, entertainment, and affective computing. This brief document will highlight the current state of the platform and some of its on-going evolution.