Blog Category: Web Technology

In the AAAI 2015 conference, we presented the work “Visually Interpreting Names as Demographic Attributes by Exploiting Click-Through Data,” a collaboration with a research team in National Taiwan University. This study aims to automatically associate a name and its likely demographic attributes, e.g., gender and ethnicity. More specifically, the associations are driven by web-scale search logs that are collected via a search engine when internet users retrieve images.

Demographic attributes are vital to semantically characterize a person or a community. This makes it valuable for marketing, personalization, face retrieval, social computing and more human-centric research. Since users tend to keep their online profiles private, name is the most reachable piece of personal information among these contexts. The problem we address is – given a name, associating and predicting its likely demographic attributes. For example, given a person named “Amy Liu,” the person is likely an Asian female. Name makes the first impression of a person because naming conventions are strongly influenced by culture, e.g., first name and gender, last name and location of origin. Typically, the associations between names and the two attributes are made by referring to demographics maintained by governments or by manually labeling attributes based on the given personal information (e.g., photo). The former is limited in regional census data. The latter has major concerns in time and cost when it adapts to large-scale data.

Different from prior approaches, we propose to exploit click-throughs between text queries and retrieved face images in web search logs, where the names are extracted from queries and the attributes are detected from face images automatically. In this paper, a click-through means when one of the URLs returned by a text query has been clicked by a user to view a web image it directs to. The mechanism delivers two messages, (1) the association between a query and an image is based on viewers’ clicks, that is, human intelligence from web-scale users; (2) users may have considerable knowledge to the associations because they might be partially aware of what they are looking for and search engines are getting much better at satisfying user intent. Both characteristics of click-throughs reduce concerns of incorrect associations. Moreover, the Internet users’ knowledge enables discovering name-attribute associations with high generality to more countries.

In the experiments, the proposed name-attribute associations are demonstrated with competitive accuracy compared to using manual labeling. It also benefits profiling social media users and keyword-based face image retrieval, especially the adaption to unseen names. This is the first work to interpret a name to demographic attributes in visual-data-driven manner using web search logs. In the future, we are going to extend the visual interpretation of an abstract name to more targets for which naming conventions are highly influenced by visual appearance.

(Please be aware that some ChatRoulette links may contain mature content.)

Dear me. All those folks doing naughty things on ChatRoulette, secure in their Net-anonymity, may suddenly meet a rude awakening: Chat Roulette Map, a new Google Maps mash-up, maps users’ chat image to their location, based on IP address. Last week, it also showed users’ ip addresses.

Note that Chat Roulette Map has just added a new pop-up window when you first load the page:

Welcome To Chat Roulette Map
(snip)
We’d like to advise maine.edu to stop using
student’s names in their hostnames.

We’ve decided, at least for the time being, to
hide IP & host information as some user-identifiable
information was found in some entries.

No, you think? It’ll be interesting to see how this warning window evolves over the next few weeks.

Last week I saw a BayCHItalk by Elaine Wherry of Meebo, in which she used the history of classical music as an analogy for the evolution of user interface and interaction design of web-based user interfaces. The parallels she established between the Baroque era and what we have experienced in the last decade of “Web 2.0″ interfaces are compelling.

Baroque music arose during the Renaissance as a reaction to the impoverished musical forms characteristic of the middle ages, an era dominated by monastic chants and songs of Troubadours. The Renaissance brought a revolution in music-making technology with the invention of a range of musical instruments. Unlike the sparse music that preceded it, Baroque music is characterized by a profusion of notes that, while initially interesting, tend to overwhelm and can render the composer’s melody unrecognizable.

The Data Liberation Front is an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products. We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to “liberate” their products. This is our mission statement: Users should be able to control the data they store any of Google’s products. Our team’s goal is to make it easier for them to move data in and out.

This is a fantastically worthy goal, and I whole-heartedly applaud it. However, I am beginning to wonder: What data is yours to own, in the first place?

WebNC is a tool for sharing your browser window in real-time with someone else. It’s similar to screen sharing tools like VNC or WebEx, except it’s built for sharing only web pages. This sounds limiting, but since a lot of work is done inside web browsers these days (browsing, editing documents, watching videos, booking reservations, vacations, reading email), we thought it would be useful. For example, my wife always calls me when she rents a car online: what car model should she pick? With WebNC, she can easily show me her browser window and we can talk more efficiently as I can see what she sees on her screen.