Can We Trust the Crowd Miners?

by Trustiser

The digital world is caught in a data deluge, caused to a large extent by the huge collection of actions, ratings, recommendations, opinions, and mere information (in the form of text, audio, or video) generated every day by the citizens of the digital world. This phenomenon has not gone unnoticed by the research and commercial communities. As a result, many companies and universities have invested heavily in developing various data mining techniques to harness the exaflood of data generated by the data deluge and discover valuable knowledge and relevant patterns.

Of particular interest is crowd mining, where gigantic databases of social information are mined to extract useful knowledge. One example is dishtip, a service offered by TipSense. TipSense devised a data mining algorithm which is able to reveal best dishes at restaurants by crunching millions of reviews, mentions, and photos of food.

Crowd mining looks very promising but the data extracted from social databases convey malicious content, such as fake ratings and recommendations, that can corrupt the results of crowd mining tools. In this context, several approaches have been developed to fight malicious content by cleaning the data. In the realm of rating services, several universities (e.g. Cornell University) and companies (e.g. Google) are working hard to detect fake ratings. However, we do believe that fake rating detection algorithms are necessary but not sufficient to deliver high quality data to crowd mining tools. Indeed, all ratings are not equal, that is the reason why each rating has to be weighted by the trust placed in the user who performed the rating. In this context, Trustiser will push the envelope by providing crowd mining engines with reliable ratings generated by a community of members arranged hierarchically; the basis of the hierarchy is the trust placed in raters in relation to each topic.