Filter-Based Fuzzy Big Joins

Summary

In the Filter-based fuzzy big joins research project for my PhD study, we try to improve the different fuzzy join algorithms in the distributed and parallel framework. We compare and evaluate analytically the algorithms to validate results with real datasets.

Excerpt

A fuzzy join query combines all pairs of tuples for which the distance is lower than or equal to a prespecified threshold $varepsilon$ from one or several relations. In this project, we run our some fuzzy join queries for many different algorithms with many different threshold.