Items by Chen, Zheng

Existing keyword suggestion tools from various search engine companies could automatically suggest keywords related to the advertisers’ products or services, counting in simple statistics of the keywords, such as search volume, cost per click (CPC), etc. However, the nature of the generalized Second Price Auction suggests that better understanding the competitors’ keyword selection and bidding strategies better helps to win the auction, other than only relying on general search statistics. In this paper, we propose a novel keyword suggestion strategy, called Competitive Analysis, to explore the keyword based competition relationships among advertisers and eventually help advertisers to build campaigns with better performance. The experimental results demonstrate that the proposed Competitive Analysis can both help advertisers to promote their product selling and generate more revenue to the search engine companies.

Behavioral Targeting (BT) is a technique used by online advertisers to increase the effectiveness of their campaigns, and is playing an increasingly important role in the online advertising market. However, it is underexplored in academia how much BT can truly help online advertising in search engines. In this paper we provide an empirical study on the click-through log of advertisements collected from a commercial search engine. From the experiment results over a period of seven days, we draw three important conclusions: (1) Users who clicked the same ad will truly have similar behaviors on the Web; (2) Click-Through Rate (CTR) of an ad can be averagely improved as high as 670% by properly segmenting users for behavioral targeted advertising in a sponsored search; (3) Using short term user behaviors to represent users is more effective than using long term user behaviors for BT. We conducted statistical t-test which verified that all conclusions drawn in the paper are statistically significant. To the best of our knowledge, this work is the first empirical study for BT on the click-through log of real world ads.

A pressing task during the unification process is to identify a user’s vertical search intention based on the user’s query. In this paper, we propose a novel method to propagate social annotation, which includes user-supplied tag data, to both queries and VSEs for semantically bridging them. Our proposed algorithm consists of three key steps: query annotation, vertical annotation and query intention identification. Our algorithm, referred to as TagQV, verifies that the social tagging can be propagated to represent Web objects such as queries and VSEs besides Web pages. Experiments on real Web search queries demonstrate the effectiveness of TagQV in query intention identification.

In this paper, we try to leverage a large-scale and multilingual knowledge base, Wikipedia, to help effectively analyze and organize Web information written in different languages. Based on the observation that one Wikipedia concept may be described by articles in different languages, we adapt existing topic modeling algorithm for mining multilingual topics from this knowledge base. The extracted “universal” topics have multiple types of representations, with each type corresponding to one language. Accordingly, new documents of different languages can be represented in a space using a group of universal topics, which makes various multilingual Web applications feasible.

In this paper, we propose to model the blended search problem by assuming conditional dependencies among queries, VSEs and search results. The probability distributions of this model are learned from search engine query log through unigram language model. Our experimental exploration shows that, (1) a large number of queries in generic Web search have vertical search intentions; and (2) our proposed algorithm can effectively blend vertical search results into generic Web search, which can improve the Mean Average Precision (MAP) by as much as 16% compared to traditional Web search without blending. these components into a single list. However, from the classical meta-search problem’s configuration, the query log of component search engines is not available for study. In this extended abstract, we model the blended search problem based on the conditional dependencies among queries, VSEs and all the search results. We utilize the usage information, i.e. query log, of all the VSEs, which are not available for traditional metasearch engines, to learn the model parameters by the smoothed unigram language model. Finally, given a user query, the search results from both generic Web search and different VSEs are ranked together by inferring their probabilities of relevance to the given query. The main contributions of this work are, (1) through studying the belonging vertical search engines’ query log of a commercial search engine, we show the importance of blended search problem; (2) we propose a novel probabilistic model based approach to explore the blended search problem; and (3) we experimentally verify that our proposed algorithm can effectively blend vertical search results into generic Web search, which can improve the MAP as much as 16% in contrast to traditional Web search without vertical search blending and 10% to some other some ranking baseline.

Both search engine click-through log and social annotation have been utilized as user feedback for search result re-ranking. However, to our best knowledge, no previous study has explored the correlation between these two factors for the task of search result re-ranking. In this paper, we show that the gap between search queries and social tags of the same web page can well reflect its user preference score. Motivated by this observation, we propose a novel algorithm, called Query-Tag-Gap (QTG), to rerank search results for better user satisfaction. Intuitively, on one hand, the search users’ intentions are generally described by their queries before they read the search results. On the other hand, the web annotators semantically tag web pages after they read the content of the pages. The difference between users’ recognition of the same page before and after they read it is a good reflection of user satisfaction. In this extended abstract, we formally define the query set and tag set of the same page as users’ pre- and postknowledge respectively. We empirically show the strong correlation between user satisfaction and user’s knowledge gap before and after reading the page. Based on this gap, experiments have shown outstanding performance of our proposed QTG algorithm in search result re-ranking.

Understanding the intent behind a user’s query can help search engine to automatically route the query to some corresponding vertical search engines to obtain particularly relevant contents, thus, greatly improving user satisfaction. There are three major challenges to the query intent classification problem: (1) Intent representation; (2) Domain coverage and (3) Semantic interpretation. Current approaches to predict the user’s intent mainly utilize machine learning techniques. However, it is difficult and often requires many human efforts to meet all these challenges by the statistical machine learning approaches. In this paper, we propose a general methodology to the problem of query intent classification. With very little human effort, our method can discover large quantities of intent concepts by leveraging Wikipedia, one of the best human knowledge base. The Wikipedia concepts are used as the intent representation space, thus, each intent domain is represented as a set of Wikipedia articles and categories. The intent of any input query is identified through mapping the query into the Wikipedia representation space. Compared with previous approaches, our proposed method can achieve much better coverage to classify queries in an intent domain even through the number of seed intent examples is very small. Moreover, the method is very general and can be easily applied to various intent domains. We demonstrate the effectiveness of this method in three different applications, i.e., travel, job, and person name. In each of the three cases, only a couple of seed intent queries are provided. We perform the quantitative evaluations in comparison with two baseline methods, and the experimental results show that our method significantly outperforms other approaches in each intent domain.

This list was generated on Fri Dec 9 13:16:48 2016 GMT.

About this site

Add your Slides, Posters, Supporting data, whatnots...

If you are presenting a paper or poster and have slides or supporting material you would like to have permentently made public at this website, please email
cjg@ecs.soton.ac.uk - Include the file(s), a note to say if they are presentations, supporting material or whatnot, and the URL of the paper/poster from this site. eg. http://www2009.eprints.org/128/

Add workshops

It's impractical to add all the workshops at WWW2009 by hand, but if you can provide me with the metadata in a machine readable way, I'll have a go at importing it. If you are good at slinging XML, my ideal import format is visible at http://www2009.eprints.org/import_example.xml

Preservation

We (Southampton EPrints Project) intend to preserve the files and HTML pages of this site for many years, however we will turn it into flat files for long term preservation. This means that at some point in the months after the conference the search, metadata-export, JSON interface, OAI etc. will be disabled as we "fossilize" the site. Please plan accordingly. Feel free to ask nicely for us to keep the dynamic site online longer if there's a rally good (or cool) use for it...

Fun Stuff

To prevent google killing the server by hammering these tools, the /cgi/ URL's are denied to robots.txt - ask Chris if you want an exception made.

Feel free to contact me (Christopher Gutteridge) with any other queries or suggestions. ...Or if you do something cool with the data which we should link to!

Handy Tools

These are not directly related to the EPrints set up, but may be of use to delegates.

Social tool links

I've put links in the page header to the WWW2009 stuff on flickr, facebook and to a page which will let you watch the #www2009 tag on Twitter. Not really the right place, but not yet made it onto the main conference homepage. Send me any suggestions for new links.

When demoing live websites, use this tool to shorten the current URL and make it appaer real big, your audience can then easily type in the short URL and get to the same page as you. Available as a javascript bookmark