Dimension Reduction n High dimensionality • 6857 tokens n n n Memory limitation Possibly under-fitting Dimension Reduction • PCA (Principle Component Analysis) n n n an orthogonal linear transformation transforms the data to a new coordinate system retains the characteristics of the data set that contribute most to its variance • Get the most important features without losing generality

What we do? n How to learn sentiments from a large set of features with lots of noise? n n Vector Space Model: MXN (Entity-Term, e. g. 6, 000 X 20, 000) Dimensionality reduction (PCA) Using supervised learning for sentimental learning Human labeling vs. Average rating • An online entity always includes many reviews with each review containing a rating n Average Rating is an alternative labeling for the entity • Manual labeling: n n 1 (least satisfactory) – 5 (most satisfactory) Three persons do labeling, most-vote-adopted

What we learned? n Dimensionality reduction is necessary • Term Vector Space Model (VSM) is huge in nature n Human labeling is necessary • Sentimental learning involved subjective judge instead of objective judge. • Human rating is very random because it is not consistent across different people • More labeling data is needed n Other methods to be used: • Unsupervised learning (clustering) n Gaussian Mixture Model (an alternative to learn sentiments, while it is difficult to know the # of hidden sentiments)

How to use learned sentiments? n Sentimental learning can be used to improve ranking of local search • Because sentimental value represents an important metrics to evaluate the rank of an entity • Local search is influenced by the sentiment n Sentimental ranking model (SRM): • Senti. Rank = a*Content. Sim + (1 -a)*Senti. Value n Empirically setting the parameter as “ 0. 5”. • Similar to Page. Rank n Page. Rank = b*Content. Sim + (1 -b)*Page. Importance