GitHub's new internal search has made it easy to uncover passwords, encryption keys, and other security missteps in software development projects that are hosted on the site.

GitHub, a popular Web-based hosting service for software development projects, announced its internal search in "A Whole New Code Search," posted on The GitHub blog on Jan. 23. Every time developers save changes to their source code on GitHub, the new search infrastructure automatically indexes the code, Tim Pease, a member of GitHub staff, wrote in the post. GitHub users can search for any string through public repositories and private repositories they have access to.

This can be useful for developers who are looking for open source libraries they can use in their projects, or even find code snippets to figure out how others solved a coding problem they are encountering. Users with multiple repositories can also use the search functionality to find specific lines of code in their own projects.

A few users discovered yet another way to use the search tool: finding files containing private encryption keys and source code with login credentials. Scarily enough, there were thousands of them.

"How not to use Github," Brian Aker, an HP fellow, posted on Twitter, along with a screenshot showing a list of files containing private encryption keys.

Users found that quite a large number of users who had added private keys to their repositories and then pushed the files up to GitHub. Searching on id_rsa, a file which contains the private key for SSH logins, returned over 600 results. Projects had live configuration files from cloud services such as Amazon Web Services and Azure with the encryption keys still included. Configuration and private key files are intended to be kept secret, since if it falls into wrong hands, that person can impersonate the user (or at least, the user's machine) and easily connect to that remote machine.

Commenters on Y Combinator's Hacker News noted that in many cases, the exposed keys were actually public keys, and many others were for demonstration or testing purposes so the damage was limited.

Even so, there were plenty of cases where developers had hardcoded passwords for privileged user accounts, such as root, sa, and admin. "The new github search makes finding people who use sa in their connection strings a lot easier," Justin Beckwith, a program manager on the WebMatrix team at Microsoft, wrote on Twitter.

"Many organizations simply don’t have proper tools, policies and procedures in place to manage their encryption keys including creation,rotation and removal," Jason Thompson, director of global marketing, SSH Communications Security told SecurityWeek. "In some organizations, tens of thousands of keys have become unaccounted for and many are often stored in an unsecure fashion somewhere on the network."

To be clear, GitHub is not at fault, since the company is just a hosting service. It just stores whatever files the developer wants to save. The search engine is not accidentally leaking confidential information. The data was already saved on GitHub, it is just making it easier for someone to find these mistakes.

"GitHub ain't for storing credentials, particularly not in public repos!" Australian security researcher Troy Hunt wrote on Twitter. While many of the logins and passwords may be defaults or examples, or for internal systems outsiders shouldn't have access to, Hunt said not all of the IP addresses referenced in the source code belonged to internal subnets. He wondered how many connection strings harvested from the search results would actually work.

While many of them may be default or example values, there were plenty that were legitimate and could be used, according to a Twitter user named Douglas Dollars, who claimed to have tried out a few.

Several commenters on Y Combinator and Reddit noted that the information was previously visible with a judicious use of Google search, so there was "zero net harm" in GitHub search exposing these mistakes.

“Storing keys with code, as many GitHub developers are doing, is just one of many worst key management practices that exposes them to significant security and operational risks," Kevin Bocek, vice-president of product marketing at Venafi, told SecurityWeek. "Equally alarming is the fact that enterprise security and audit teams have almost no visibility into the extent of the problem. GitHub is just one repository that happens to be in the cloud.”

However, SSH's Thompson did warn about the dangers of exposed keys, regardless of how they were discovered and exposed.

"With a simple script or tool, external hackers or malicious insiders can quickly discover these lost keys and use them to gain access to critical information assets," Thompson said. "If the key grants a high level of administrative access, such as root, the potential threat to the business grows exponentially."

GitHub actually has a very thorough Help page on how to make sure sensitive data is not saved to the repository. Developers should take a moment to review the page again.

"Once the commit has been pushed you should consider the data to be compromised," the Help page warns. If the user accidentally committed the key, then a new one should be generated. If a password is entered, it should be changed. Even if the file itself has been removed from the repository, references will remain in the repository's history.

Files with sensitive information can also be added to the .gitignore list. When the user is pushing code up to the GitHub repository, none of the files on that list get copied. While acknowledging that this wasn't necessarily GitHub's responsibility, several Y Combinator commenters wondered if the company could set up an automated rule where certain files, such as id_rsa, AWS configuration files, and others are automatically blocked and added to .gitignore.

“Today it might be development and test code in GitHub, tomorrow it could be production code in Amazon AWS or Windows Azure," Bocek warned. Until enterprises have control of their keys and certificates, they won't be able to trust the cloud with their data. The first step is to discover the keys and certificates inside the enterprise and within the cloud. "Many enterprises find the number of keys and certificates is 4-5x the number they expected,” Bocek said.

The mistakes may reflect the overall education problem among software developers. When you have expedited programs—"6 weeks and you'll be a real software developer"—to teach developing, security becomes an afterthought. And considering "90 percent of development is copying stuff you don't understand, I'd bet most of them simply don't know what id_rsa is," Mike Perham, director of engineering at TheClymb.com, tweeted.

"Do all of your beginner dev mates a favour, make sure they haven’t put their ssh keys and other secret files on github in a public repo," Jamie Van Dyke, a Ruby developer, wrote on Twitter.

Fahmida Y. Rashid is a Senior Contributing Writer for SecurityWeek. She has experience writing and reviewing security, core Internet infrastructure, open source, networking, and storage. Before setting out her journalism shingle, she spent nine years as a help-desk technician, software and Web application developer, network administrator, and technology consultant.