S3 Cloud Data Protection Using Machine Learning

March 26, 2018

Back in August 2017, AWS launched a new service called AWS Macie. MACIE was San Diego startup Havest.io’s flagship AI driven product that analyzed behavior patterns across logins, remote network access, and data access to identify suspicious behavior. AWS acquired the security startup and turned the technology into a product that today monitors data stored in S3.

In additional to assigning customizable risk scores based on regex, themes, file extensions and content types, the fully managed service automates monitoring of data access activity for anomalies and creates detailed alerts when it detects risk of unauthorized access or data leaks. It can fire an alert if someone mistakenly makes public a file that contains sensitive data.

Under the hood, enabled IAM roles created for the service review event data from AWS CloudTrail to identify PUT requests to S3. As data is being added, a classification or risk score is determined based on file content being uploaded. Over 70 data types related to PII, PHI and API Keys and other sensitive groupings are used to determine a risk score between 1 and 10, with 10 being the highest risk. The product provides drill down customizable dashboards that can be used to monitor alerts.

Pricing is based on GB analyzed, so when first enabled against S3 buckets, expect the first month to be much higher as baseline scans are performed. Additional pricing details can be found on the product’s pricing page.

While this service is currently limited to S3 and US East and US West regions, it is still a powerful tool for managing compliance and if need be, incident response. Additional support is expected in 2018 across other AWS products including EC2, DynamoDB, RDS, EFS and Glue.