„Mo‘ Data Mo‘ Problems“: The Two Faces of Big Data

Big Data analysis requires optimized technical solutions, but also the human component. At a meetup in Munich, the tech community gained insights into both sides.

Last Thursday, we invited the tech community to a Meetup with the motto “The Two Faces of Big Data” to our Munich office. In a relaxed atmosphere the visitors gained insights into the technical side of Big Data, including algorithms, applications, difficulties and solutions, and the human side, i.e. manual quality testing for Big Data applications. Over delicious food and cool drinks, there was also ample opportunity to exchange ideas with Cliqz experts and other attendees.

The Technical Face of Big Data

At Cliqz, Big Data plays an important role, especially in the self-developed search, which works independently of other search engines thanks to its own index. In his technical talk “High-Dimensional Nearest Neighbor Search”, Erik Larsson, Software Engineer at Cliqz on the Search Backend Team, dealt with the nearest neighbor problem and described various solutions and applications.

Cliqz’s search engine, for example, attempts to answer difficult queries by using nearest neighbor search to find similar, simpler queries. The algorithm joins different queries together that have the same (or similar) meaning, despite having only few words in common. By identifying the similarity between two queries, such as “where to eat in Munich” and “best restaurants in Munich”, the results can be improved.

Furthermore, Erik explained why most exact standard solutions for low-dimensional data are not suitable for higher-dimensional spaces. One way to overcome this, is to simplify the problem and use approximate methods instead (such as Annoy, HNSW or granne). In the end, it’s a matter of choosing the best method for the respective application, depending on dimensionality, data size and structure.

The Human Face of Big Data

In addition to the technical component, the human component also is very important when it comes to evaluating vast amounts of data. In her presentation “The Human Face of Big Data – Organization and Scalability of Manual Testing for Big Data Applications”, Dr. Humera Noor Minhas, Team Lead of Quality Analysis at Cliqz, talked about the motivation and need for having manual assessment of Cliqz search results, the challenges it brings and how the quality analysis team tackle them.

If you think of Big Data you first think of powerful, fast and scalable machines. However, machines lack an important element: the human intellect. Problems that humans can easily solve with “common sense” are often hugely difficult for machines. This applies, for example, to object recognition, identification of people or content analysis.

Humera introduced the work of the quality analysis team, which stands for the human face of Big Data at Cliqz.

The latter is essential for Cliqz to reliably identify, for example, adult content, improve overall search quality, and provide users with the most relevant results. The manual assessment of URLs and websites is an important part of the work of the quality analysis team. It’s a team of diverse backgrounds including people who studied literature, philosophy, humanities, economics and biology, and who speak over nine languages in total. They also need a high level of technical understanding in order to identify problems and address them themselves or to forward them to the responsible engineers.

Conclusion

Besides optimized technical solutions, Big Data always requires the human component. No matter how efficient and sophisticated the mechanical evaluation may be, human intellect is required for the assessment of the results. We at Cliqz are convinced that the best possible outcome can only be achieved by combining both sides of Big Data. Therefore, you should never forget the human component when you think of Big Data.