CFP: Second International Conference on Weblogs and Social Media

March 31-April 2 2008Seattle, Washington, U.S.A.

Call For Papers

The rapid creation and consumption of social media content continues to drive the evolution of the Internet and the Web. Social media content now accounts for the majority of content published daily on the web.

As the space evolves, researcher and industrial practitioners find themselves at a key point for collaborating on research, implementation and deployment of a wide range of analyses and applications. The International Conference on Weblogs and Social Media invites researchers in the broad field of social media analysis to submit papers for its second meeting. Following in the tradition of earlier workshops and the first meeting in Boulder, USA in 2007, we anticipate an exciting, high quality event which will bring together academic and industrial practitioners to present and to discuss new research, applications, thoughts and ideas that are shaping the future of social media analysis.

Areas of interest

The conference aims to bring together researchers from different subject areas including computer science, linguistics, psychology, statistics, sociology, multimedia and semantic web technologies and foster discussions about ongoing research in the following areas:

People interested in participating should submit through the conference website a technical paper (up to 8 pages), poster or demo description (up to 2 pages) by the deadlines given above (Midnight PST). Each submission should indicate a list of relevant areas from the list above.

On May 28th, Microsoft pushed out some nice updates to its Virtual Earth mapping system (my post here). The next day, Google dropped streetside imagery into Google maps (note that Microsoft had a demo of this a while back). Today, Microsoft announced Surface Computing.

From what I've seen of the surface computing product, it looks like Microsoft may have beaten Perceptive Pixel to the punch.

May 26, 2007

While it feels like online advertising has been around for a long time, it is still behaves in a dawdling, infantile manner. Two brief examples:

Auto-advertising - I'm not talking about ads for cars. I'm talking about cases where a web site serves up some contextual ads that are there to draw the reader to the very same web page that is serving that ad. I say this recently on DailyKos (no screen cap, you will have to trust me on this one).

RSS-advertising - because RSS is an information exchange or transmission mechanism, not an information presentation mechanism (which a web page is), it builds in any number of indirections between the content creator and the reader's client. Thus, when injecting ads into RSS feeds we get, by way of example, things like this (from Jeff Jarvis as experienced in BlogLines); you can just make out the post content in between the repeated ads from HitWise):

May 24, 2007

Google labs recently upgraded their trends feature - a system which displays the volume of use of a query over time. The new feature mines this data set for query terms which are showing some interesting upward changes. At first, when I looked through the top few trends, they all seemed to look the same - like this (note to Google - put the title of the graph on the graph):

In other words, a flat - almost zero - level of activity followed by a sharp up swing. However, today, it appears that there is a better variety of examples:

There still seems to be a bias towards jumps from very low volume of some sort to - well, we don't know what the vertical scale of these graphs is, so we have no idea as to the significance of the trends. Providing a ranking function that attempts to optimize multiple variables is hard, and it appears that this algorithm discounts trends for terms which already have a reasonable volume. For example, if there was a sudden run on the term 'iraq', it may not appear as (I assume) the existing search volume for iraq is probably quite high.

Google trends - with the added hotness - are fun. One improvement they could make would be to include sparklines for each term so that one wouldn't have to click through to see the shape of the trend.

I'm always on the look out for new and interesting consumer deployments of time series in the search space. It is interesting to note that just as Goolge updates this feature, Technorati has sunk their time series feature leaving BlogPulse as the only real tool for blog time series (and here, of course, I include the BlogPulse clone IceRocket).

May 23, 2007

Technorati launched a significant update yesterday. While I'll let others discuss the merits (or lack thereof) of the new design, I'd like to lament the new scoring that Technorati offers bloggers. Previously, there was a ranking score which produced a number from 1 (the highest) to <unbounded> the lowest (or, effectively, unranked). With that form of metric, you could look at the number and get an idea as to where you were and track any changes you were making. With the new 'authority' score, we have a range from <unbounded> (the highest) to 0 (the lowest). Consequently, you can no longer look at your score and figure out where you are in the grand scheme of things. You would have to at least know the highest score.

This, to me, is a poor design of metric.

Interestingly, as the comments on the TechCrunch post indicate, reaction to the ticker at the top of the page has been somewhat mixed or negative. If you look at them closely, you will see that it isn't a real time stream of terms, but a cycle of terms repeated over and over (refreshed at some point I assume). In other words, the ticker nature of it is a presentation decision, not necessitated by the fact that it is (could have been) a real time stream of data.

In the big picture, I think the new design is an overall improvement, and the search speeds to appear to have improved.

May 22, 2007

I've been keeping an eye out for information on Sentimetrix for the past 6 months or so. Today I came across this pdf document from UMaryland which gives a little bit of concrete information about the company:

With a University of Maryland College Park technology spurring the idea, a group of local businessmen and scientists have launched a new company that measures a tough to quantify concept – worldwide opinion.

SentiMetrix, of Bethesda, Md., was launched in fall 2006 by university professor V.S. Subrahmanian, graduate student Diego Reforgiato, and two businessmen who spent time with online giant AOL LLC, Vadim Kagan and Michael Rozenman. The company is based on Subrahmanian’s Opinion Analysis SYStem. commonly known as OASYS. It was developed at the University of Maryland Institute for Advanced Computer Studies, and is based on a series of complex algorithms.

“Recent surveys show that the marketing research market, including opinion research and brand monitoring, is growing rapidly, as new technologies get applied to electronic media, both professionally and consumer generated media,” Rozenman said. “We have started SentiMetrix because we believe that the OASYS technology is the best response to what these markets need today: sentiment tracking in multi-lingual data, done in a timely, cost effective way.”

OASYS, which was a finalist for the OTC Invention of the Year Award, is capable of tracking the media on the Internet in many languages, measuring the intensity of sentiment expressed on a variety of subjects. For this reason, it is unique, the businessmen say: most programs detect just “polarity” on a subject (like/don’t like, for example), and most are not multi-lingual.

So far, the founders said they intend to build “an extensive data collection operation” for traditional and consumer-generated media, including mainstream news, blogs and message boards, starting

first with English-language sources. They will then move on to other languages, starting with the most frequented Web sites.

V.S. Subrahmanian

A SentiMetrix customer will use OASYS through a controlled access Web site, with a search engine-like interface available to run queries. A visual representation and quantitative data is available, and a free Web site with limited options will be developed as a market tool, Rozenman said.

Thus far, OASYS has won Computerworld Magazine’s 2006 Horizon Award, which goes to the most innovative pre-commercial technology.

Opinion analysis program leads to new start-up company

The most interesting paragraph in this text to me is (my emphasis):

OASYS, which was a finalist for the OTC Invention of the Year Award, is capable of tracking the media on the Internet in many languages, measuring the intensity of sentiment expressed on a variety of subjects. For this reason, it is unique, the businessmen say: most programs detect just “polarity” on a subject (like/don’t like, for example), and most are not multi-lingual.

Much of the published work on sentiment/opinion fails to really define what sentiment or opinion is (taking the machine learners path of least resistance: a data set, an algorithm and a result). As a customer, I'd firstly want to know what their precise definition of sentiment is and then how they measure intensity. In addition, there are many types of expressions which while not opinions or sentiment still convey important information about topics and products. For example 'my Hummer broke down' isn't a subjective, opinionated statement but closer to an objective reporting of facts. It is still an important class of statement to capture as it reflects on the quality of the product.

One of the challenges of creating a single valued metric for sentiment or opinion is that equally mixed (aggregated) opinion tends to score the same as neutral opinion. In the simple case, if you have 1 person expressing a strongly negative opinion and another expressing a strongly positive opinion, then the aggregate may be some number in the middle of the range (say, 0 in a -1 to +1 scale). However, let's say that for a topic there is no expression of opinion - that too would score a 0. From the description provided here of the system, it seems that OASYS may suffer from this problem.

TracSense's subtitle is 'making sense of social media' so, as the page is in Finnish, I'm not sure what the relationship between the two companies is. Certainly TracSense appears to be pretty new - there are no real results in any of the major search engines for them. WhiteVector's site states that their product will launch at the end of Q1 2007, so perhaps TracSense is their product.

From the site, it seems pretty clear that TracSense/WhiteVector are in the same space as BuzzMetrics, Umbria, Cymfony et al.