Archive

Blog Category: Research

Touch input is now the preferred input method on mobile devices such as smartphones or tablets. Touch is also gaining traction in the desktop segment and is also common for interaction with large table or wall-based displays. At present, the majority of touch displays can detect solely the touch location of a user input. Some capacitive touch screens can also report the contact area of a touch, but usually, no further information about individual touch inputs is available to developers of mobile applications.

It would, however, be beneficial to capture further properties of the user’s touch, for instance the finger’s rotation around the vertical axis (i.e., the axis orthogonal to the plane of the touch screen) as well as its tilt (see images above). Obtaining rotation and tilt information for a touch would allow for expressive localized input gestures as well as new types of on-screen widgets that make use of the additional local input degrees of freedom.

Having finger pose information together with touches adds additional local degrees of freedom of input for each touch location. This, for instance, allows the user interface designer to remap established multi-touch gestures such as pinch-to-zoom to other user interface functions or to free up screen space by allowing input (e.g., adjusting a slider value, scrolling a list, panning a map view, enlarging a picture) to be performed at a single touch location that usually need (multi-) touch gestures that require a significant amount of screen space. New graphical user interface widgets that make use of finger pose information, such as rolling context menus, hidden flaps or occlusion-aware widgets have also been suggested.

Our PointPose prototype performs finger pose estimation at the location of touch using a short-range depth sensor viewing the touch screen of a mobile device. We use the point cloud generated by the depth sensor for finger pose estimation. PointPose estimates the finger pose of a user touch by fitting a cylindrical model to the subset of the point that corresponds to the user’s finger. We use the spatial location of the user’s touch to seed the search for the subset of the point cloud representing the user’s finger.

One advantage of our approach is that it does not require complex external tracking hardware (as in related work), and external computation is unnecessary as the finger pose extraction algorithm is efficient enough to run directly on the mobile device. This makes PointPose ideal for prototyping and developing novel mobile user interfaces that use finger pose estimation.

This week at the ACM Conference on Document Engineering, Laurent and Scott are presenting new work on direct manipulation of video. The ShowHow project is our latest activity involving expository or “how to” video creation and use. While watching videos of this genre, it is helpful to create annotations that identify useful frames or shots using ShowHow’s annotation capability directly, or by creating a separate multimedia notes document. The primary purpose of such annotation is for later reference, or incorporation into other videos or documents. While browser history might be able to get you back to a specific video you watched previously, it won’t readily get you to a specific portion of much longer source video efficiently, or provide you with the broader context in which you found that portion of the video noteworthy. ShowHow enables users to create rich annotations around expository video that optionally include image, audio, or text to preserve this contextual information.

For creating these annotations, copy and paste functionality from the source video is desirable. This could be selecting a (sub)frame as an image or even selecting text shown in the video. Also, we demonstrate capturing dynamic activity across frames in a simple animated GIF for easy copy and paste from video to the clipboard. There are interaction design challenges here, and especially as more content is viewed on mobile/touch devices, direct manipulation provides a natural means for fine control of selection.

Under the hood, content analysis is required to identify events in the video to help drive the user interaction. In this case, the analysis is implemented in javascript and runs in the browser on which the video is being played. So efficient means of standard image analysis tools such as region segmentation, edge detection, and region tracking are required. There’s a natural tradeoff between robustness and efficiency here that constrains the content processing techniques.

The interaction enabled by the system is probably best described in the video below:

It is reasonably well-known that people who examine search results often don’t go past the first few hits, perhaps stopping at the “fold” or at the end of the first page. It’s a habit we’ve acquired due to high-quality results to precision-oriented information needs. Google has trained us well.

But this habit may not always be useful when confronted with uncommon, recall-oriented, information needs. That is, when doing research. Looking only at the top few documents places too much trust in the ranking algorithm. In our SIGIR 2013 paper, we investigated what happens when a light-weight preview mechanism gives searchers a glimpse at the distribution of documents — new, re-retrieved but not seen, and seen — in the query they are about to execute.

Jony Ive is a fantastic designer. As a rule, his vision for a device sets the trend for that entire class of devices. Apparently, Jony Ive hates skeuomorphic design elements. Skeuomorphs are those sometimes corny bits of realism some designers add to user interfaces. These design elements reference an application’s analog embodiment. Apple’s desktop and mobile interfaces are littered with them. Their notepad application looks like a notepad. Hell, the hard drive icon on my desktop is a very nice rendering of the hard drive that is actually in my desktop.

When we rolled out the CHI 2013 previews site, we got a couple of requests for being able to search the site with keywords. Of course interfaces for search are one of my core research interests, so that request got me thinking. How could we do search on this site? The problem with the conventional approach to search is that it requires some server-side code to do the searching and to return results to the client. This approach wouldn’t work for our simple web site, because from the server’s perspective, our site was static — just a few HTML files, a little bit of JavaScript, and about 600 videos. Using Google to search the site wouldn’t work either, because most of the searchable content is located on two pages, with hundreds of items on each page. So what to do?

At a PARC Forum a few years ago, I heard Marissa Mayer mention the work they did at Google to pick just the right shade of blue for link anchors to maximize click-through rates. It was an interesting, if somewhat bizarre, finding that shed more light on Google’s cognitive processes than on human ones. I suppose this stuff only really matters when you’re operating at Google scale, but normally the effect, even if statistically-significant, is practically meaningless. But I digress.

I am writing a paper in which I would like to cite this work. Where do I find it? I tried a few obvious searches in the ACM DL and found nothing. I searched in Google Scholar, and I believe I found a book chapter that cited a Guardian article from 2009, which mentioned this work. But that was last night, and today I cannot re-find that book chapter, either by searching or by examining my browsing history. The Guardian article is still open in a tab, so I am pretty sure I didn’t dream up the episode, but it is somewhat disconcerting that I cannot retrace my steps.

The prolific Jaime Teevan has decided to blog, as evidenced by the creation of “Slow Searching” a few weeks ago. In a recent post, Jaime wrote about some ways in which Twitter search differed from web search, among which she included monitoring behavior, running “the same query over and over again just to see what is new.” Putting on my Lorite hat for a minute, this seems quite similar (albeit on a different timescale) to the “pre-web” concept of routing or standing queries. At some point, later, Google introduced Alerts, which seemed to be its reinvention of the same concept. And of course tools like TweetDeck make it much easier to keep up with particular Twitter topics.

We are looking for an intern to work with us this summer in the area of social media analysis. The project will involve understanding and mining patterns within Twitter data, in both text and images. An ideal candidate is a PhD student with strong machine learning skills. Prior experience in image understanding, text data mining, social network analysis, or statistical modeling is a plus. If you are interested in this project, please send your CV to Dhiraj dhiraj@fxpal.com or Francine chen@fxpal.com.

One of the things we did slightly differently in this year’s HCIR Symposium was to introduce full-length, pier reviewed, top-tier conference-quality papers. We received a number of submissions, each of which was read and discussed by three reviewers. We then rejected some of papers, and sent several back for a rewrite-and-resubmit cycle. In the end, we accepted four papers, which have now been published in the ACM Digital Library.

Last week we held the HCIR 2012 Symposium in Cambridge, Mass. This is the sixth in a series that we have organized. We expanded the format of this year’s meeting to a day and a half, and in addition to the posters, search challenge reports, and short talks, we introduced full papers reviewed to first-tier conference standards. I will write more about these later, and for details on other events at the Symposium, I refer you to the excellent blog post by one of the other co-orgranizers, Daniel Tunkelang.

In this post, I wanted to record my impressions of the keynote talk by Marti Hearst from UC Berkeley.