Water Quality Anomaly Detection with SVM and K-Means

This model was developed as a part of Chantilly High School science fair project with a purpose to identify quick detection of anomalies.

This project tries to identify cloud-based machine learning solutions for threats from terrorism and other malicious activities and to find out possible ways to quickly detect biochemical threats to the United States environment. The research in our project includes the water quality monitoring data from the Environmental Protection Agency (EPA) monitoring station, Microsoft Azure Machine Learning Studio, personal laptops, WebEx for web meetings and collaborations, and telephones for conference calls with EPA and Microsoft Subject Matter Experts. The steps of the experiment include data collection of free chlorine, pH, conductivity, turbidity, Ultraviolet Rays (UVA), and Oxidation- Reduction Potential (ORP’s), algorithm selection, training, scoring, validating, reviewing results, refining models and publishing trained models. The outcomes were classification based on machine learning algorithms to detect anomalies in water quality data. The Support Vector Machine (SVM) and K-Means cluster helped in classifying data sets into two or three clusters that can facilitate anomaly detection systems. The Two-Class classifiers helped in predicting potential values based on the time step data from the monitoring stations. It was concluded that Microsoft Azure Cloud-based Machine Learning mechanisms were easier to experiment with and to train models using various Machine Learning algorithms with large sets of data. These experiments did not require a huge set-up of many computers necessary for processing such large sets of numerical data using complicated machine learning algorithms such as K-means, Support Vector Machine (SVM), perceptron, Bayesian classifiers and neural networks. Therefore, we can simplify the implementation of complex machine learning techniques to understand patterns from millions and millions of sensor data received by the EPA and/or Homeland Security. This experimental methodology can be extended to support additional sensor data from homes, schools, community centers and other water quality data collection venues using contemporary water quality measuring instruments. The additional sensor data can be retrieved using the concept of Internet of Things (IoT) to quickly detect anomalies in water quality from potential biochemical terrorist or other malicious activities.