Blog

Rules and Best Practices Still Couldn’t Prevent the Docker Hub Breach

The recent Docker Hub breach hits home with anyone who develops and hosts code on Docker Hub, GitHub, or any other cloud-based repository. But while the magnitude of the damage was significant, it’s still remarkable that these kinds of breaches continue to happen.

As of now, we know the Hub was not only exposed, but it appears that privileged data was targeted. The extent of the damage is yet unknown, but for organizations hosting images in Docker will have to monitor access and activity in their repos. Attackers don’t usually do this just to get inside; this is typically the path through which malicious code is dropped and that could reverberate into code that’s ultimately, and inadvertently delivered to their users. At the risk of overstating the obvious, the effects of this kind of breach could be huge. Organizations using Docker Hub rely on the well-known but strange dynamic of sharing among the benighted, but vault-like security to protect against everyone else.

That’s the environment we all operate within because, a) from an evolutionary standpoint, technology has afforded us the benefits of being able to be both open and highly selective about access; and b) competition demands we move fast, and the adoption of things like CI/CD has made speed a de facto element of the development process. Repositories like Docker Hub and GitHub enable us to achieve these things, but they we’re still on the hook for ensuring that our stuff — code, images, repos — are secure. And sometimes, in the rapid pace of pushing code, security can be overlooked.

This is not a new issue, and it’s why there are so many lists of proscribed best practices that are intended to save us from ourselves. Some of these best practices seem so simple, yet as the Docker Hub breach suggests, they still aren’t being followed. Just at the identity management layer of your environment, we’re talking about things like:

Require Multi-factor Authentication (MFA) Everywhere: Having a strong password is not enough these days. You need multiple layers of protection. Using a second validation or authentication method provides another layer of protection around your user login.

Least Privilege Roles: Only give users access to the least amount of accounts and systems that allow them to be productive. This limits the damage that can be done if an accident occurs or a bad actor gets access to the account.

Disable Dead Accounts: When people leave your organization, disable access to all systems and disable their access keys immediately. Dead accounts leave more endpoints and are not monitored the same way live ones are.

Lacework offers a guide to a more complete set of practices that take into account securities for the identity, compute, storage, and services layers and well.

But breaches like Docker Hub continue to happen, and much of that is simply because following best practices is dependent upon human action; people have to actually create the process for the action, get cooperation among the organization to implement it, and ensure there is compliance. That’s leaving a lot to busy people who may be far more focused on pushing code.

These attacks in a rules-based system are hard to define because effective breaches are often based on impersonation of something legitimate. Take unauthorized access for example. Employees leave their jobs, but organizations don’t follow a discipline to remove them, so dead accounts are actually active. Since those accounts have access to cloud resources, someone impersonating credentials is able to operate freely within the environment while looking like they’re legitimate.

With automated anomaly detection, however, you will always be notified of events that are not normal to your environment. This ensures that you are notified of activity that you want to know about without having to sift through all the noise. In the case of dead accounts, anomaly detection will recognize and report on accounts that are suddenly active after periods of inactivity. They can identify when abnormally large amounts of data are extracted from databases, or if users are accessing from unusual IP addresses.

Fine-tuning rules to accurately alert on critical events is a challenge. Many times you end up writing rules that are too granular that miss out on events that are likely just as important. This can end up resulting in a missed event which could eventually have catastrophic consequences. Other times you may write your rules that are almost ‘catch alls’ resulting in an overload of events. This results in alert fatigue, where events are mostly ignored. This can also result in a missed event which can lead to a critical security incident. Data is leaked, your business is compromised, and it’s all happened without any awareness by the security tool.

At the same time, teams are overwhelmed by the magnitude of the environments they manage so they have to rely on shortcuts like dashboards and logs to make sense of activity and, specifically, if that activity is threatening. Security dashboards are typically fueled by data that’s generated from a rules-based approach where activity that runs counter to structured rules is flagged through alerts. This approach limits visibility, and you can’t secure what you can’t see.

Human activity doesn’t scale to meet these demands, nor can it adapt to the complexity required to continuously update rules, and organizations need to know their security posture is aligned with how fast they need to move. This may be the time to go with the new approach of using automated anomaly detection to identify bad actors within your environment.

The Lacework approach removes the rule-writing element because of our unsupervised machine learning that performs automated anomaly detection. Once the product is deployed, it begins to learn and understand your environment by analyzing data from your cloud accounts and workloads. From here it creates a baseline and automatically alerts you of any anomalous behavior . You’re getting value almost immediately, and you don’t need to wait to determine if your rules are working. You’ll just have to identify the resources that you would like to monitor and allow the product to do the rest.