Keyword extraction on online advertisement using clustering and classification methodology

Author

Liu, Peng

Date of Issue

2017-04-17

School

School of Computer Science and Engineering

Related Organization

Optimate

Abstract

Keyword advertising is a form of online advertising that an advertiser pays to have an
advertisement appear in the results listing when a person uses a phrase to search the web.
Selection of keywords is particularly important as they summarize the key characteristics
of the advertised products and services, and serve as the important factor for advertiser to
increase the reach of the advertisement (Ad) and potentially the conversion rate. In my
company, Optimate, we provided the services to help clients optimize their online
marketing campaign, advertisement placements and customer reach via multiple channels
such as Google Adwords and Facebook. Keyword selection remains a crucial component
to increase the overall effectiveness and efficiency of the services. In the report, I aim to
propose a new keyword extraction approach from the advertisement text, while
considering the grammar pattern of the text, historical ads and the other attributes such as
industry and objective. The whole approach can be broadly divided into three phases,
keyword candidate generation, Clustering using K-Means and K-nearest-neighbour
classification. Selection rules on keyword candidates are based on linguistic feature and
Part-of-Speech (POS) pattern of the ad content. The aim of keyword candidates is to
generate a comprehensive list of possible keywords for subsequent classification. Kmeans
clustering divides ads into different groups, and the subsequent classification is
performed only on the group which the ad is in. Such way helps reduce the computing
complexity and choose the best group which can yield better keywords. Then the TD-IDF
feature of the keyword candidates is analysed. Cosine Distance is also computed and
inputted into K-nearest-neighbour classification. Based on the majority vote of 20
neighbour keywords, the candidate keyword is classified into either a true keyword or a
false keyword. This approach achieves good results in extracting keywords, but there are
still issues limiting its effectiveness. Nevertheless, this approach offers a quick, highly
flexible, and easily implementable solution to keyword extraction.