GitHub is a popular online code repository used by over 26 million people across the world for personal and enterprise uses. GitHub offers a way for people to collaborate on a distributed code base with powerful versioning, merging, and branching features. GitHub has become a common way to outsource the logistics of managing a code base repository so that teams can focus on the coding itself.

But as GitHub has become a de facto standard, even among software companies, it has also become a vector for data breaches— the code stored on GitHub ranges from simple student tests to proprietary corporate software worth millions of dollars. Like any server, network device, database, or other digital surface, GitHub suffers from misconfiguration.

In June of this year, a third party development firm left “development notes, raw source, internal reports on web banking code development plans, and records of telephone calls with outsourcing partners” on a public GitHub repository. This information belonged to “six big Canadian banks, two well-known American financial organizations, a multinational Japanese bank, and a multibillion dollar financial software company.” A few years ago, Uber put their database key in a public repo.

Although GitHub itself is designed to be secure, the work performed to maintain it varies from organization to organization. A single repository set to public could leak entire blueprints onto the internet. Poor user management leaves open doors to valuable data. The way GitHub is used becomes a type of business risk, one easily overlooked in a sea of servers and network gear.

GitHub and UpGuard

UpGuard can add GitHub organizations and repositories as nodes, just like a server, router, or website. Through GitHub’s API configurations are automatically assessed and recorded, creating an audit trail for repository access control and configuration over time. With policies, UpGuard can detect when an organization or repo does not comply with your requirements. This automatic and continuous validation means that enterprises can use GitHub for their projects without risking unauthorized access.

GitHub Nodes

There are two ways UpGuard can look at GitHub: by organization, and by repository. These two types of nodes allow you to control important top level configurations as well as granular detail per repository.

The Organization Node

Top level configuration information

Number of public and private repositories

All member profile details

Outside Collaborators

The Repository Node

Repo Configuration Settings

Collaborator Permissions

Merge Permissions

Fork Details

What We Analyze

What are we looking at when we scan these GitHub details?

Org Public Profile - This includes basic information like your org name, URL, email, and location.

Org Subscription Info - Most importantly, the number of repositories in your org, and how many of them are private.

Member and Outside Collaborator Profiles - Basic information like name and email address. Important details like which teams the person belongs to, whether they are an administrator, and if they have two-factor authentication enabled.

Team Permissions and Membership - Every team, who is on them, and what permission level the team has in the organization.

Repository Settings - Merge permission details, fork information, and whether or not the repository is private.

Collaborator Permissions per Repo - Individual and team permissions for every collaborator in every repository.

Visibility and Inventory

With your GitHub organization and repositories in UpGuard, you have total visibility into your GitHub footprint, including the configurations and permissions that keep your data private. In addition to point-in-time visibility, UpGuard also tracks GitHub configurations over time, so that repos and orgs can be compared for differences in any given timeframe.

Managing an inventory of your GitHub instances can become difficult as projects, collaborators and repositories increase. UpGuard presents all of your GitHub assets so old repositories and collaborators aren’t overlooked during normal maintenance. But UpGuard’s real strength is the ability to automate control over settings that are important to you.

GitHub Policies

When you know exactly what you have, you can measure it against what you want. UpGuard does this with policies, allowing you to set expectations for every single configuration item, so that deviance from them can be found and remediated in as short a cycle as possible.

We’ll look at some specific use cases for how UpGuard’s policies can prevent critical misconfigurations from turning into data breaches or unauthorized use of GitHub. With UpGuard in place, organizations can take advantage of GitHub’s value without exposing themselves to its risks.

Maintaining GitHub Permissions

Since UpGuard monitors organization members, teams, and outside collaborators, policies can be established to control access and to prevent misconfigured permissions or user settings from going unchecked. For example:

Monitor membership for privileged teams, so that if an account is added without authorization, a flag is instantly raised.

Ensure that outside collaborators in your organization are on the right teams and have the right permissions.

Find accounts that don’t use two-factor authentication, a best practice to protect against compromised passwords.

The goal of these policies is twofold: drastically reduce the administrative overhead necessary to manage permissions across large GitHub environments, and protect the data inside those environments from exposure and misuse.

Keeping Repositories Private

There’s one particular configuration for every GitHub repository that absolutely must be enforced if sensitive or proprietary data is being stored there. The “private” setting on a repository is what restricts access from the world at large. When “private” is set to false, the repository and its contents are visible to the internet.

We create a simple policy in UpGuard for this one setting. We want to ensure that the repository “private” configuration is set to true.

Once we have the policy, we can apply it to our repos. We can create a node group for all of our private repositories, and apply the policy to this node group.

Running a policy report shows us how many of our repos comply with the policy. Any failures are currently exposed to the internet.

After our remediation efforts, we can check the policy results again, to ensure everything is secure.

To prevent configurations from drifting out of compliance, we’ll set up an alert so that if any of our existing repositories become public, we are immediately notified.

The key here is that cloud resources like GitHub repos and Amazon S3 storage buckets can be made internet-accessible. To utilize these resources for corporate purposes means the risk of internet-exposure must be explicitly mitigated to protect the privacy of everyone involved.

Conclusion

Cloud resources offer enormous value. But it’s a mistake to assume that you can just start pushing sensitive information to them without considering the risk of data breach through that vector. The same things that make GitHub powerful— distributed use, accessibility from anywhere, collaborative support— create the possible risks it poses. But it doesn’t mean you have to throw the baby out with the bathwater to maintain privacy. You just have to implement controls so that when an error does happen, and it always does, you have a system in place to catch it before it becomes a much bigger problem.

UpGuard supports GitHub as a node because every aspect of an organization’s digital footprint contributes to their overall risk. Servers and network devices are important, and UpGuard supports them in full, but just as important are the non-traditional spaces being used to store and process sensitive information. These spaces tend to be overlooked during security hardening and process control, and become blindspots for cyber risk, one mistake away from massive data exposure.