06 April 2008

I recently attended ICWSM (International Conference on Weblogs and Social Media), which consisted of an interesting mix of researchers from NLP, Data Mining, Pyschology, Sociology, and Information Sciences. Social media (which defined generally can include blogs, newsgroups, and online communities like facebook, flikr, youtube, del.icio.us) now accounts for the majority of content produced and consumed on the Web. As the area grows in importance, people are getting really interested in finding ways to better understand the phenomenon and to better build applications on top of it. This conference, the second in the series, has nearly 200 participants this year. I think this is a rewarding area for NLPers and MLers to test their wits on: there are many interesting applications and open problems.

In the following, I'll pick out some papers, just to give a flavor of the range of work in this area. For a full list of papers, see the conference program. Most papers are available online (do a search); some are linked from the conference blog.

Interesting new applications:

1) International sentiment analysis for News and Blogs -- M. Bautin, L. Vijayarenu, S. Skiena (StonyBrook) Suppose you want to monitor the sentiment of particular named entities (e.g. Bush, Putin) on news and blogs across different countries for comparison. This may be useful for, e.g., political scientists analyzing global reactions to the same event. There are two approaches: One is to apply sentiment analyzers trained in different languages; Another is to apply machine translation on foreign text, then apply an English sentiment analyzer. Their approach is the latter (using off-the-shelf MT engine). Their system generates very-fun-to-watch "heat maps" of named entities that are popular/unpopular across the globe. I think this paper opens up a host of interesting questions for NLPers: Is sentiment polarity something that can be translated across languages? How would one modify an MT system for this particular task? Is it more effective to apply MT, or to build multilingual sentiment analyzers?

2) Recovering Implicit Thread Structure in Newsgroup Style Conversations, by Y-C. Wang, M. Joshi, C. Rose, W. Cohen (CMU) Internet newsgroups can quite messy in terms of conversation structure. One long thread can actually represent different conversations among multiple parties. This work aims to use natural language cues to tease apart the conversations of a newsgroup thread. Their output is a conversation graph that shows the series of post-replies in a more coherent manner.

3) BLEWS: Using blogs to provide context for news articles -- M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst, C. Konig (Microsoft) Every news article has its bias (e.g. liberal vs. conservative). A reader who wishes to be well-educated on an issue should ideally peruse articles on all sides of the spectrum. This paper presents a system that aids the reader in quickly undertanding the political leaning (and emotional charge) of an article. It does so by basically looking at how many conservative vs. liberal blogs link to a news article. I think this paper is a good example of how one can creatively combine a few existing technologies (NLP, visualization, link analysis) to produce an application that has a lot of value-added.

Methods and algorithms adapted for social media data:

4) Document representation and query expansion models for blog recommendation -- J. Arguello, J. Elsas, J. Callan, J. Carbonel (CMU) This is an information retrieval paper, where the goal is to retrieve blogs relevant to an user query. This is arguably a harder problem than traditional webpage retrieval, since blogs are composed of many posts, and they can be on slightly different topics. The paper adopts a language modeling approach and asks the question: should we model blogs at the blog-level, or at the post-level? They also explored what kind of query expansion would work for blog retrieval. This paper is a nice example of how one can apply traditional methods to a new problem, and then discover a whole range of interesting and new research problems due to domain differences.

Understanding and analyzing social communities:

5) Wikipedian Self-governance in action: Motivating the policy-lens -- I. Beschastnikh, T. Kriplean, D. McDonald (UW) [Best paper award] Wikipedia is an example of self-governance, where participant editors discuss/argue about what should and can be edited. Over the years, a number of community-generated policies and guidelines have formed. These include policies such as "all sources need to be verified" and "no original research should be included in Wikipedia". Policies are themselves subject to modification, and they are often used as justification by different editors under different perspectives. How are these policies used in practice? Are they being used by knowledgeable Wikipedian "lawyers" or adminstrators at the expense of commonday editors? This paper analyzes the Talk pages of Wikipedia to see how policies are used and draws some very interesting observations about the evolution of Wikipedia.

6) Understanding the efficiency of social tagging systems using information theory -- E. Chi, T. Mytkowicz (PARC) Social communities such as del.icio.us allows users to tag webpages with arbitrary terms; how efficient is this evolving vocabulary of tags for categorizing the webpage of interest? Is there a way to measure whether a social community is "doing well"? This paper looks at this problem with the tools of information theory. For example, they compute the conditional entropy of documents given tags H(doc|tag) over time and observe that the efficiency is actually decreasing as popular tags are becoming overused.

Overall, I see three general directions of research for an NLPer in this field: The first approach focuses on building novel web applications that require NLP as a sub-component for the value-added. NLPers in industry or large research groups are well-suited to build these applications; this is where start-ups may spring up. The second approach is more technical: it focuses on how to adapt existing NLP techniques to new data such as blogs and social media.This is a great area for individual researchers and grad student projects, since the task is challenging but clearly-defined: beat the baseline (old NLP technique) by introducing novel modifications, new features and models. Success in this space may be picked up by the groups that build the large applications.The third avenue of research, which is less examined (as far as I know), is to apply NLP to help analyze social phenomenon. The Web provides an incredible record of human artifacts. If we can study all that is said and written on the web, we can really understand a lot about social systems and human behavior.

I don't know when NLP technology will be ready, but I think it would be really cool to use NLP to study language for language's sake, and more importantly, to study language in its social context--perhaps we could call that "Social Computational Linguistics". I imagine this area of research will require collaboration with the social scientists; it is not yet clear what NLP technology is needed in this space, but papers (5) and (6) above may be a good place to start.

28 comments:

As to number three, I think it'd be "computational sociolinguistics" to remove the ambiguity. The sociolinguists have been using it for years. One of my favorites is Penelope Eckert's Jocks and Burnouts, where social network analysis is mixed with fine-grained phonetic analysis. I'd think blogs might also be amenable to computational historical linguistics techniques.

runescape gold of RS Online Game, we have made several changes to Fun Orb. If you wish to buy runescape to explore other spell books, you should subscribe as a Fun Orb member. Shattered Plans - a galaxy-spanning strategy epic that use rs gold to allow up to six players to battle for supremacy. We only plan to pay cheap rs gold when we feel you will appreciate the updates. And we certainly feel that these changes and some runescape money are worth telling you about.

Buy Rom Gold is the chance. I always have a bad dream when my account was theft, since I buy Rom Gold ; I had not had the bad memory. At present, I want to say thanks to the people who stole my account, if he did not to do that, I would not play game, I would not have Runes of Magic Gold . Although I have little Runes of Magic money , I will on the way of the game for long time. At one time or another, I am a pessimistic person, but when I have cheap Runes of Magic Gold , it changes my attitude of life.

Do you know eve isk? I like it. My brother often goes to the internet bar to buy eve online isk and play it. After school, He likes playing games using these buy isk with his friend. I think that it not only costs much money but also spend much time. One day, he give me many cheap eve isk and play the game with me. I came to the bar following him and found buy eve online isk was so cheap.

Do you know fiesta Gold? I like it. My brother often go to the internet bar to buy fiesta money and play it. After school, He likes playing games using these fiesta online gold with his friend. I do not like to play it. Because I think that it not only costs much money but also spend much time. One day, he give me many buy fiesta Gold and play the game with me. I came to the bar following him and found fiesta online money was so cheap. After that, I also go to play game with him.

Do you know dofus kama? I like it. My brother often goes to the internet bar to buy kamas and play it. After school, He likes playing games using these cheap kamas with his friend. I do not like to play it. Because I think that it not only costs much money but also spend much time. One day, he give me many dofus gold and play the game with me. I came to the bar following him and found buy dofus kamas was so cheap.