How machine learning filters clean your data

A simple definition of filters is that they filter out information or data which are not wanted. A more complex definition is that filters take one list of information/data, made of one or several columns, and convert it into another modified list. It does this by examining the content of the list and changing it through an algorithm according to a set of criteria based on some particular goal, such as filtering out blanks or overlaps in an EPG file.

For Spotless, the fundamental goal is to convert rogue data which lies in a dataset into spotlessly clean data that are now ready to enter your data platform. We do this by using our machine learning filters, based on artificial intelligence, and our patent-pending algorithm.

Using Spotless Data's machine learning filters

To get a basic idea of how to use Spotless rogue data removing filters you can watch youtube video demos on data cleaning an EPG file and data cleaning a genre column. One great thing about our filtering process is that, after you upload your data file to our My filters page (you need to be signed-up to our service to see this) and receive a report suggesting any modifications that our automated systems believe the data require, is a process that, except for the very largest of files, will take less than a minute. It is then you, the owner of the data, who is in charge and who customises the filters by setting the specifications of the data cleaning process through following the easy-to-understand instructions given to you to controls how they work on filtering your data to clean them. Then the subsequently scrubbed-up file is sent back to you, meeting your requirements for data quality you can trust in.

Thus it is you who defines the data quality you want so that your datasets will fit together before they enter your data pipeline. This cleaning takes no more than a minute or two. You can use comma separated value as well as tab separated value files for when you have lots of data that you need to clean quickly. The whole process should normally take no longer than 5 minutes, as the youtube videos demonstrate.

Complex cases

There are occasions when the data cleaning is complicated or unusual, and we will be unable to clean the data solely using our automated processes. In these cases, we quarantine the data and then quickly get in touch with you via your contact details by writing you an email so that we can discuss with you how we can resolve the issue between us. As data quality is our passion in life we love nothing more than resolving some new and tricky data cleaning issue that defies even our machine learning filters, and so you can be sure of our very best attention in trying to solve whatever issues may arise. And in the knowledge that, once we ensure you are a satisfied customer by solving the issue, our Machine Learning filters will then have learnt something new to take forward into future tasks and won't have any problems dealing with this particular complex issue again.

The same is true if for some reason you are not satisfied with any of the results you are getting from our data cleaning and cannot tweak said results to your satisfaction. We are giving away 500Mb of free data cleaning to each new customer precisely to allow you to tweak the specifications of the cleaning processes to ensure that any rogue data in your particular datasets are eliminated entirely to your satisfaction to ensure a seamless and complete data integration. If you aren't fully happy with the results and cannot see any easy way to resolve the issue please click on the icon on the bottom right-hand corner of any of our webpages and one of our team will quickly respond so you can chat to them. We refuse to accept defeat when it comes to cleaning any particular rogue data issue.

Machine learning the key to our filters

Rogue data costs your business in terms of both money and reputation as well as skewing your internal reporting. This itself can have a catastrophic effect on how your business operates, due to poor decision-making that is not because of any incompetence on the part of your decision makers but simply because they are making decisions based on faulty information caused by rogue data lying at the heart of your datasets which lack data validation and need a thorough filtering process to ensure their cleanliness.

You will find it incredibly easy to implement our Python API into Airflow Celery or, on the other hand, any other tools you are using to build your data pipeline.

We guarantee that your data are secure and not accessible by any unauthorised parties during the time they are in our care and we take this responsibility very seriously indeed.

Here is a quick link to our FAQ. You can also check out our range of subscription packages and pricing. If you would like to contact us you can speak to one of our team by pressing on the white square icon with a smile within a blue circle, which you can find in the bottom right-hand corner of any of the web pages on our site.

If your data quality is an issue or you know that you have known sources of dirty data but your files are just too big, and the problems too numerous to be able to fix manually please do log in and try now