Dataprep

Adding Filtergrams to the Cluster and Edit could make the operation exponentially powerful.

Consider a dataset with Geographic data, two states could possess the cities with the same name for example:Pittsburg in California and Pittsburgh in Pennsylvania. Applying a Cluster and Edit on the City column could result in incorrect data.

In general, we recommend Paxata users to perform the cluster and edit operation in an iterative manner using a combination of the multiple algorithms available (Fingerprint, ngram and metaphone) . Using multiple cluster and edits increases the accuracy of your results that come out of the data cleansing project.

Example:

The goal is to execute the transformation shown in the image...
Read More