Black Duck Open Hub Bloghttps://blog.openhub.net
Open Hub, Blog, Documentation, InfoWed, 10 Jan 2018 01:37:18 +0000en-UShourly1https://wordpress.org/?v=4.9.1https://blog.openhub.net/wp-content/uploads/2014/07/OH_logo-e1406051343599.pngBlack Duck Open Hub Bloghttps://blog.openhub.net
3232OhlohMetahttps://feedburner.google.comSpammers: Check. Projects next. And Other Goodies.https://blog.openhub.net/2017/10/spammers_check_projects_next/
https://blog.openhub.net/2017/10/spammers_check_projects_next/#commentsTue, 24 Oct 2017 20:02:51 +0000http://blog.openhub.net/?p=3435Hail Open Hubbites! We’ve been working hard and focusedly over the past month and would like to share some updates.

The first, and biggest news, is that we ended our offshore partnership at the end of September. There were a number of drivers for this, but the immediate impact is that we are currently a 4 person team. This explains why it’s been even more difficult than usual to keep up with all the ways folks get in touch with us: forum, blog posts, email, and tweets. We’re sorry about that and are working hard to keep with with everyone’s questions. We are planning on adding more members to the team over time, so this pressure will ease a bit.

We also completed the process of our Machine Learning trained dataset and permanently deleted some 60K spam accounts. We’re really grateful for the work of our intern, Sourav Das, currently at MIT, for his amazing attitude and contribution. He lead this work and created great results. This effort really caps the push we made to get spammers off our our site, when we suspected that as many as 2/3 of our accounts were spam. We currently have some 232K accounts in good standing. Having cleaned out some 500K accounts, we were pretty close in our estimate. The good news is that the rate of spam account creation is well within our ability to monitor.

There is still more to do in this regard. There are a accounts that created projects and edit on the site, which makes them look like legitimate users, but the projects they created are really spammy advertisements. We plan on applying the ML learning we’ve done to train the algorithm to detect spam projects and start cleaning them off the site. Let’s take a quick look at what that means:

In our home page, we say that we are indexing 472K projects. That’s the number of undeleted projects on the Open Hub. However, not every project has analyzable code. When we count projects that are not deleted and have had an analysis in the past — at any time — we see there are 292K such projects; about 62% of all projects. This really could mean that the difference, some 180K projects that have never been analyzed, could really be spam projects. Some of them are legitimate OSS projects that don’t have analyzable code — documentation projects and that ilk. But there really are only a tiny number of those. I’d wager that most all of those 180K unanalyzed projects are junk. So that next ML project to find them and get rid of them is important so that we have real numbers about activity in the OSS community.

Let’s turn our attention back to the nearly 300K projects that have had analysis. Of them there are nearly 28K projects that have a CodePlex repository. As you probably know, CodePlex has gone into Read-Only mode and will be shutting down entirely by the end of this year. Most of those projects have only CodePlex repositories. In accordance with our mandate to provide analytics on available and active Open Source Software project, we will be deleting all those projects. (We’ll start a background effort to find any new locations for those projects). This will drop the number of undeleted projects that have been analyzed at some point in the past to some 263K projects.

Finally, when we consider projects that have jobs in a permanent state of failure, which blocks our ability to generate updated analytics, we have to remove an additional 40K projects, which drives down the number of active projects that can reliably be updated: we’re really looking at 223K projects.

We have two major strategies to increase the number of available and active OSS projects on the Open Hub. The first is to lower the threshold to getting new projects into the Open Hub. We are working on an overhauled and streamlined workflow for adding projects from GitHub to the Open Hub. Right now, we support only a bulk-upload of all repositories in a GitHub account into a single project. We plan on letting users create new projects, assign repositories to the same new projects, and assign repositories to existing projects. This will let users quickly get their projects into the Open Hub.

If this works as well as hoped, we can expand it to other forges, such as GitLab. Your requests for other forges are most welcome.

The other strategy is to continue the cleanup work to examine failed jobs and see what can be recovered. But our user community is the best strategy we have.

Even with “only” 223K projects, there is no way we can manually review even the majority of them. Therefore, we will be making it possible to use your GitHub account to create and sign into the Open Hub. By lowering the barrier to being able to make edits on the Open Hub and relying on the maturity of the GitHub environment, we hope that more users will be willing to push their Open Source Software project to the Open Hub, and review existing projects to ensure we have the most up-to-date information.

We are also working on capping the number of enlistments that any one project can have. This will make it easier to review and maintain projects. We will introduce a new feature ‘Open Hub Collections” to gather blocks of related projects, like Linux distributions, so that users can quickly see which projects make up a collection and quickly navigate to related projects.

Gosh, there is so much more. There are ongoing database architecture upgrades, rewrite of the analytics engine, a future overhaul of Ohloh SCM, ongoing reviews of pull requests against Ohcount, and the daily task of making sure everything we have is working the way it’s supposed to. Oh, and we updated the CSS so that the Open Hub is in line with the standards defined by the Black Duck Marketing team. But, these will have to wait for another post.

Thank you for being part of the Open Source Community and the Open Hub. (And please provide your feedback in our survey, which will be open for only a little bit longer: https://www.surveymonkey.com/r/G5HN2JH)

]]>https://blog.openhub.net/2017/10/spammers_check_projects_next/feed/5Raise your voice!https://blog.openhub.net/2017/09/raise-your-voice/
https://blog.openhub.net/2017/09/raise-your-voice/#commentsMon, 18 Sep 2017 21:51:55 +0000http://blog.openhub.net/?p=3431With the achievement of Making Our Back End Screamingly Fast, we are shifting the focus of our little team to more UI features.

We opened the survey only a few days ago and already have a few dozen responses. So, we’re thinking that we’ll leave it open for a few months, then close it and write all about it.

Please fill out the survey and ask your friends and colleagues to do so as well. The link is available from every page on the Open Hub.

Thanks!

]]>https://blog.openhub.net/2017/09/raise-your-voice/feed/8Four Months. 1/3 of a Year. About 123 Days Have Passedhttps://blog.openhub.net/2017/09/four-months-13-of-a-year-about-123-days-have-passed/
https://blog.openhub.net/2017/09/four-months-13-of-a-year-about-123-days-have-passed/#commentsMon, 18 Sep 2017 21:04:05 +0000http://blog.openhub.net/?p=3429Yeah, but it’s been only around 87 work days. On the other hand, we keep strange hours and are working regularly on Sunday morning to perform upgrades and improvements. We’ve done a lot and would like to share some things with you.

At the end of May, we announced our FIS Ohloh Database Split (FODS) project (About the FODS Architecture), and we said that there was still work to do, and we got busy working on that. Here’s a quick punch list:

Fixed an issue in SVN where a “REPLACED” file was incorrectly counted

Did a round of performance improvements on the FODS architecture

Post FODS, the Ohloh UI Website performance was poor. We brought server response times back to pre-deployment speeds.

Our goal over the past few months was to Make Our Back End Screamingly Fast. And we’ve achieved that.

About a year ago, we were coming off of the massive Back-End Background issue, and Project updates were in the double digits per hour. Like “10” and “20”. With the work we’ve done, it’s consistently been 5,000 updated analyses per hour.

We also dealt with the 200,000+ projects that had only enlistments at the long-defunct Google Code forge. We deleted them. We have a plan to search for those projects on GitHub, and if we can find them, we can re-add the project, but it was really important to clear out all those projects that would never be able to be updated again, This let us clear out all the failed jobs related to those projects too.

Up next is Microsoft’s Codeplex. In October, the site will switch to “read-only” mode. In December, the site shuts down. We’ll do the same process — delete the projects and clear out the jobs.

Right now, we’re updating 68% of our projects every 3 days. When we drop the Codeplex projects, which are mostly broken because CodePlex broke their SVN implementation ages ago (Google it; it’s too painful for me to talk about), we expect to push that number over 80%.

Oh, yeah, on May 11, 2017, after I presented at OSCON, we switched the Ohloh UI repository from private to public. So yeah, the Open Hub is OSS. While I was at OSCON, I also had a chance to sit down and chat with the indomitable Randal Schwartz for TWIT’s FLOSS Weekly.

We’ve got more wonderful things planned, so more blog posts coming. As always, thank you for being a member of the Open Source Community and the Open Hub

]]>https://blog.openhub.net/2017/09/four-months-13-of-a-year-about-123-days-have-passed/feed/1About the FODS Architecturehttps://blog.openhub.net/2017/05/about-the-fods-architecture/
https://blog.openhub.net/2017/05/about-the-fods-architecture/#commentsThu, 18 May 2017 22:29:17 +0000http://blog.openhub.net/?p=3403Over the weekend starting on Friday, May 5, 2017, we deployed a significant upgrade to our architecture and we’d like to share some details.

In The Beginning

Above is a picture of our architecture before the weekend deployment. We had four applications using the same database:

The database was a single PostgreSQL 9.6 database that was over 1.6 TB in size. With the delivery of the eFISbot features to support Fetch, Import and SLOC operations for the Knowledge Base team as well as our own Open Hub, we clearly saw that even a modest increase in eFISbot request processing impacted the database and resulted in poor performance for the web application. In brief, we couldn’t scale to support the anticipated load on eFISbot.

Current Architecture

In our plans for 2017, we committed to making the backend screamingly fast and talked about how we gotten approval for new servers to support this. Starting at 8 PM EDT on Friday, May 5 we took a major step towards delivering on that commitment. We called it the “FIS Ohloh Database Split” (FODS).

We moved the four largest tables that are critical to Fetch, Import, and SLOC operations to a new FIS database and set up PostgreSQL Foreign Data Wrapper (FDW) to send data back and forth between the two. This moved the bulk of the 1.6 TB of the database over to the new (and powerful) servers, leaving only 65 GB on the original database servers.

Not Yet Done

As is often the case in significant architectural upgrades, not everything worked smoothly out of the box. We are seeing two classes of problems. One is apparent when viewing Commit Summary pages for the largest projects. We’re seeing queries taking a massive amount of time. The other is the time it takes to execute project analysis jobs: analyze jobs that used to take a couple of minutes can run for more than half a day. Obviously, this is causing a massive backlog of projects that are not getting analyzed. Normally, we complete an AnalyzeJob in a few minutes and can process between 600 and 1000 jobs per hour.

Part of the analyze job run duration, we are also seeing analyze jobs fail in the last step of the analyze job with a PostgreSQL Serialization error. This means that there are analyze jobs that have not been able to complete successfully. Right now (I just checked), we have over 131K AnalyzeJobs scheduled, with about 600 completed in the past few days and about 200 that have failed with 99% of them failing with the PostgreSQL Serialization error, presumably related to our use of the FDW.

Both of these seem to be traceable to the FDW. I’m not blaming the FDW for anything. We are reasonably certain that we are not using the FDW optimally. There were some obvious changes that were needed by adopting FDW and we did those during our development and testing cycle. Then there were things that we did not predict or behave differently in production than they did in staging, even though we did a lot of work to simulate our production environment in staging. But as is usually the case, there are some new things that are found only in production. The two cases described above fall into that category.

Even with these issues, we feel the FODS deployment was a tremendous success because the vast majority of pages display at least as fast, plus we have tremendous capacity to grow the eFISbot service.

Here’s what we doing about it

For the project analysis jobs, we examined the issue from a number of perspectives and identified a few tables that we could migrate from the OH DB to the FIS DB. Initial tests show that Analyze queries that took 12K ms to run are now running in 1.6K ms, almost 8 times faster.

For the Commit pages, we are working with the SecondBase gem to allow the Ohloh UI to directly access the FIS DB for the data stored there rather than push massive queries through the FDW. Initial tests show that this also results in multiples of better performance, although we’re still gathering the numbers.

While the use of a direct connection to the FIS DB will improve performance on the vast majority of Commit pages, the largest projects still represent a special problem. Right now we have just over 676K projects. Only 3 of them have more than 1,000 enlistments — Fedora Packages, KDE, and Debian. All three of these are Linux distributions. We briefly mentioned distributions in our post about 2016 plans and now is the time to implement them. The plan is to create a new entity called a “Distribution,” which represents a collection of Open Source Software projects. This is different from an Organization because the Distribution represents a packaged and related collection of OSS projects. By doing this, we can process each of the projects within the Distribution individually and then aggregate the analysis results for the Distribution.

The way this would work would be that, in the case of Fedora packages with 11,956 enlistments, we would create a project for each of the enlistments and then group all those new projects into the Distribution. When looking at the Distribution, we can provide the list of projects, links to each of them, plus aggregate the data from those projects with a new “Distribution Analysis”. Most importantly, when displaying the Distribution, we won’t have to try to aggregate the commits from all 12K enlistments into a single view.

Next Steps

We are working quickly on testing and verifying behavior using the new distribution of tables and the second DB connection. We hope to have improvements deployed within a week.

]]>https://blog.openhub.net/2017/05/about-the-fods-architecture/feed/14Researching Project Security Datahttps://blog.openhub.net/2017/05/researching-project-security-data/
https://blog.openhub.net/2017/05/researching-project-security-data/#respondThu, 04 May 2017 15:45:11 +0000http://blog.openhub.net/?p=3396This started with a message from the outstanding Marc Laporte about the Project Security Data for the Tiki Wiki CMS Groupware project. Marc took what looks like a healthy amount of time to carefully document the discrepancies and areas of confusion around the security report. In kind, we’ve taken a deep dive into the data.

The Problem (in Brief)

Marc highlighted a few problems. The first was that we were missing versions. We were able to address that problem.

The more complicated one is that there are discrepancies in the versions reported affected by a vulnerability as well as inexplicable ordering of the versions.

The Explanation (Not Brief)

The author sat down with one of the senior member of the Black Duck Software (BDS) Knowledge Base (KB) team to look at the data being presented and to start unwinding the trail of data production back to its beginnings.

We looked at the data in our KB, the channels through which those data are obtained, and looked at how we have gotten and are getting those data as well as what we are doing with it. The issue mostly boils down to “dates are hard.” Note that we’re not talking about engineers getting dates — that’s a different topic altogether — but how a non-trivial system discovers, identifies, and documents dates that are connected to important events such as version releases of software.

Our story starts some 15 years ago in the early days of Black Duck, when ad hoc Open Source Software (OSS) standards were few and the forges were fewer. BDS engineers were interested in getting information about OSS fundamentally for license compliance. Releases were important, but licenses were more so. Dates were captured when available, and typically from the date stamps of files after syncing files locally, but there wasn’t a focused interest on the dates of releases. It seemed like good metadata to have and we like metadata.

One obvious challenge to this model is when a team uploads a body of work onto a forge. Different file date stamps can be lost from the original system and replaced with the timestamp at which time the files were created on the new system. At this point, the KB sees a number of release tags all with the same time stamp.

We layer onto this the reality that projects were often duplicated on different forges or through different release channels; for example, the source forge and the project’s download page. Over the years, the KB Research team has performed tirelessly and relentlessly in hunting down and correcting duplications and merging projects together. Please recall that the KB tracks significantly more projects than the 675,000 projects we track on the OH. All that said, we believe we have an opportunity to re-examine the merge logic and attempt to improve the dates selected for version releases and have opened a ticket to do that work.

However, one of the most complicating factors is that we don’t always know about all the releases in spite of these methods and learn about a release only when it’s mentioned in the CVE report. When this happens, we create a record for the release and, in lieu of any better information, record the date we learned about the release as the release date.

Add into this particular challenge that vulnerability feeds will often state that a vulnerability affects “this and all previous” versions. What exactly does that mean? Is version 6.15 before or after version 8.4? When one is confident in the dates we have for version releases, we can use those to determine what came before, but as we just examined, one cannot always be confident about such dates. What about applying the vulnerability to previous versions across branches? For example, a vulnerability affects 3.6 “and all previous versions.” We would all likely agree that impacts version 3.5, 3.4, and 3.3 — all previous 3.x versions. But what about version 2.10? Was that really affected as well? What about all the 2.x or 1.x releases? It just isn’t clear. And, what if the vulnerability was in a component the project used? That isn’t clear either from the available data feeds.

Oh, and we should mention that vulnerability feeds, such as the NVD Data Feeds, change over time. For example NVD version 1.2 provided this “affected versions” identifier, but it was dropped in version 2.0, although we expect that it will return in version 3.0.

The takeaway is that we think we can do something in the short term that might help clear up dates to make them more reasonable, but the real fix will come from improved efforts on identifying the actual versions that are affected by vulnerabilities so we can do away with blanket policies.

This is why Black Duck is making a concentrated effort to provide effective information on OSS vulnerabilities. We’ve assembled a dedicated research team that is focusing on this problem to ferret out in greater and more reliable detail the true relationships between vulnerabilities and the OSS projects and versions affected by them.

]]>https://blog.openhub.net/2017/05/researching-project-security-data/feed/0Open Hub Scheduled Downtimehttps://blog.openhub.net/2017/04/open-hub-scheduled-downtime/
https://blog.openhub.net/2017/04/open-hub-scheduled-downtime/#commentsThu, 27 Apr 2017 17:06:51 +0000http://blog.openhub.net/?p=3394There are two periods of scheduled downtime for the Open Hub over the next week or so.

The first will start tomorrow, Friday, April 28, when Black Duck’s IT team starts a network update project in our production data center. There may be a series of service interruptions through this weekend. We do not expect that the Open Hub will be down for any long period of time should it be unavailable, although these service interruptions will be unpredictable.

The other service interruption will start on Friday, May 5 at 8 PM EDT. We will be taking the Open Hub offline at this time in order to perform a significant architecture upgrade. Unfortunately, this work cannot be done during the IT work window because the network upgrade will mean, by definition, that the network and some systems will be unavailable.

We’re really excited by this architecture upgrade, which we’re calling the FIS – OH Database Split (FODS) project. This work is separating our back end that provides the Fetch, Import, and SLOC operations from the Analysis and UI operations. This is the culmination of a significant amount of work over more than a year as we took incremental steps to mitigate production risk, expand our capabilities and scalability, and improve our website performance. To do this, we need to pull the site offline to make a backup, restore it to two systems, and then run a series of scripts to update the schemas for the new two database architecture, then all the applications need to be deployed and verified before we bring the entire set of systems back online. We expect that all this work will take the majority of the weekend and the Open Hub will be unavailable throughout this work.

As always, thank you so very much for being a member of the Open Source Software community and a part of the Open Hub.

]]>https://blog.openhub.net/2017/04/open-hub-scheduled-downtime/feed/4New Feature: I Use This from Search Results Pagehttps://blog.openhub.net/2017/03/new-feature-i-use-this-from-search-results-page/
https://blog.openhub.net/2017/03/new-feature-i-use-this-from-search-results-page/#commentsWed, 22 Mar 2017 22:26:14 +0000http://blog.openhub.net/?p=3380Our thanks to longtth for this suggestion in the Forums:”I think it would be very handy to have the “I use this” button directly on the search result page.”

A few weeks ago we rolled out this feature on the Project Search Results page.

The new feature is available with every search result and will open the Stacks Dialog box just the same way the button would operate on the project page.

Also, please don’t forget that you can add new projects from the Project Search Results page as well!

]]>https://blog.openhub.net/2017/03/new-feature-i-use-this-from-search-results-page/feed/2That was not funhttps://blog.openhub.net/2017/03/that-was-not-fun/
https://blog.openhub.net/2017/03/that-was-not-fun/#commentsFri, 03 Mar 2017 10:34:21 +0000http://blog.openhub.net/?p=3371The Open Hub is up and running again after a full day of being unavailable. We apologize for any inconvenience this unexpected downtime caused and want to share what we know about what happened.

In brief; while performing a minor version upgrade of our PostgreSQL database from version 9.4 to 9.6, the upgrade process had a catastrophic failure and we lost the entire database.

Fortunately, we had made a backup before starting the process, and were able to restore from it. However, we did loose a few days of data and changes. For that we are truly sorry.

We’ve done these upgrades before. As a general rule, we don’t like to get more than 2 minor revisions behind in anything in our stack. So, we planned for the upgrade, tested it rigorously in our staging environment, carefully documented each step and command that would need to be executed. Normally we would only do this kind of work on a Sunday morning, when the Open Hub has the least amount of traffic.

The decision to proceed with the upgrade rests entirely with me as team lead.

We expected a 20 minute upgrade process, followed by an Analyze to generate the necessary statistics which could have taken up to an hour. We figured the site would be back up in less than 2 hours.

But very early in the process, one of the first pg_upgrade statements generated an error because the target data directory was erroneously entered as the mount point, owned by root, instead of a subdirectory owned by postgres. This should have simply generated the error, we would have fixed the command and continued on our way.

However, when we checked file systems, it was immediately apparent that the data directory in the original 9.4 location was completely gone, along with all our data. We’ve scoured the history files and the logs to see if there was anything else that could have been a factor, but do not see anything else. We have even read the source code of the pg_upgrade feature (available at https://doxygen.postgresql.org/pg__upgrade_8c.html#a3c04138a5bfe5d72780bb7e82a18e627).

We are now looking over the entire site and getting updates we know we’ve made after the database backup was made. Please don’t hesitate to ping us on Twitter at @bdopenhub, or contact us at info@openhub.net with any observations, insults, questions, or comments, etc.

We’d love to share some of the things that have been going on and will be going on here in Open Hub Land. We accomplished some very significant work in 2016 and would like to take a moment to lay it out and then talk about what we’d like to accomplish in 2017.

2016 Review

Please recall from our 2016 Review what we did in 2015: rebuilt the UI, addressed spam account creation, improved back-end performance (5X in some cases), started inventing new security data features. The plan for 2016 was to create a new Project Vulnerability Report and Project Security Pages, run the Spammer Cleanup Program, virtualize the back end (the FISbot project), switch to Ohcount4J, connect to other sites related to OSS. Here’s how we did:

Started running batches of accounts through the Spammer Cleanup Program. To date, we’ve cleared out some 350,000 spam accounts (YAY!!)

Design and implemented a Prototype Project Security Page to report known vulnerabilities in OSS projects. Collected user feedback from that experiment

Explored using Ohcount4J instead of Ohcount. Decided to stay with Ohcount.

Added a feature to add an entire GitHub account to a single Open Hub project

Numerous back end improvements and defect resolutions to consistently delivery web pages under 200 ms (6X faster than 2015 on average)

Defended against a number of malicious attacks against our API service and web site (comes with the territory of running a non-trivial web application, amirite?)

There’s more though!

The FISbot was implemented as a stop gap measure to address issues we had with the back end bare metal crawlers. We were waiting for another project to provide a central set of Fetch, Import, and SLOC services to the Black Duck enterprise. The plan was to shut down the FISbots and use this other service. However, after deploying our FISbots, it was decided that we should expand the FISbot to handle the additional enterprise scenarios. So, completely unplanned at the beginning of the year, we implemented the eFISbot Project, which we also delivered last year.

Last point: as we talked about in Detail on the Infrastructure post, the migration of that 10TB collection of OSS project data onto the production server ran into serious issues that forced us to re-fetch every one of the nearly 600K code locations we monitor. This was a serious multi-month disruption, from which we have mostly entirely recovered. We have re-fetched all the repositories, but there are lingering issues in getting all those repositories and corresponding projects refreshed in the 24 – 72 hour window we’ve set for ourselves.

So, in summary, we’ll add to our 2016 Review:

Implemented and delivered eFISbot

Survived the treacherous NFS SNAFU and the Great Code Location ReFetch

I feel it is also important that we mention again the passing of our friend and colleague Pugalraj Inbasekaran in February. I still feel his absence as an ache near my heart and miss him.

2017 Plan

We have a few main focuses for 2017

Make the back end screamingly fast

Make it wicked easy to add projects from GitHub to the Open Hub and get data from the Open Hub into your GitHub pages

Continue the UI update with wider pages and more responsive layouts

Add new languages to Ohcount

For that back end, we’ve been given permission to obtain a new set of servers. Currently, the Open Hub runs off a single database (we’ve talked aboutthatover and overagain). We’ve put in a purchase request for 2 database servers that have over 4X the CPU cores and 9X the RAM. One server will be the master and the other the replicate. These servers will support only Fetch, Import, SLOC and Analysis operations (write intensive) so, we’re calling this the FISA DB. The current database will remain with the purpose of only presenting generated analysis (read intensive) through the Ohloh-UI application, so that will be the UI DB. We are SO VERY EXCITED!!! SQUEEEEEE!!! Ah. Sorry; sorry. Please excuse the author (but it’s SOO exciting!)

As always, thank you so very much for being part of the Open Source Software community and your continued support of the Open Hub.

Looking forward to this year’s Rookies and looking back at Rookies past

This time of year is one of great anticipation at Black Duck. We are eagerly anticipating a very special delivery. A crew of helpers is busy putting together a list. It will be thoroughly checked and even checked twice. I wouldn’t say any on this list are naughty – in fact, most are pretty good. But we’re looking for the ones that are really really nice.

I’m speaking, of course, about the candidate list for the Black Duck 2016 Open Source Rookies. Each year, we review the open source projects started during the last 12 months and recognize those that stand out because of their mission, community growth, and market impact. A lot of great software is being built using Open Source, as was demonstrated by the 2015 Open Source Rookie Class, and we’re looking forward to our review of this year’s candidates.

You Can’t Win if You Don’t Enter

I’ve previously written about how we select the Open Source Rookies so I won’t go in to detail about it here. Suffice to say that it’s a thorough process that starts when we pull data from our open source project database, OpenHub. OpenHub allows open source project contributors and teams to aggregate data about their projects and communities. While this is not the only data source we use, the information in it helps us get a more complete picture of what’s happening with each project.

Here’s where you come in. Remember that Christmas when you didn’t write to Santa and instead of getting that cool new video game you got socks? This is kind of like that. If you participate in or know of a new open source project that deserves a place in the 2016 Open Source Rookies, it will significantly improve the project’s chances of being selected if it has been registered in OpenHub by December 15th.

A Look Back at Prior Rookies

This will be the 9th year for Open Source Rookies and a quick look back shows you just how ambitious open source projects are, and how mainstream they have become. Of course, we’d like to think that these projects were helped, at least a little, by having been recognized as Back Duck Open Source Rookies.

Hashicorp Vault – Class of 2015

Rising Star with Open Source in its DNA

https://www.hashicorp.com/
We recognized Hashicorp last year for the launch of Vault, a framework for securely storing, accessing, and managing secrets across an enterprise, but most people probably know them as the team behind the popular development environment management solution, Vagrant. 2016 has been a good year for Hashicorp, who in September announced a $24 million series B funding round led by GCV Capital and Mayfield fund. We’ll be watching for more news from them in 2017.

Kubernetes – Class of 2014

Container Orchestration at Scale

Google has been using containers for years to develop its current scale of technologies. At the summer 2014 DockerCon, the Internet giant open sourced a container management tool, Kubernetes, that was developed specifically to meet the needs of the exponentially growing Docker ecosystem. Since then use and development of Kubernetes has flourished and it has become the one of the most widely adopted orchestration solutions for management of large scale container-based deployments.

Docker – Class of 2013

Has raised over $180M in venture funding

Docker was a clear stand-out for us back in 2013. Few projects outside the highly corporate-sponsored arena garner the level of excitement and attention that Docker did. While Docker was started by a small, commercial firm previously known as dotCloud, it quickly caused industry heavy hitters like RedHat and Google to take notice. Docker has revolutionized the way teams build scalable applications for the cloud. Since launch, Docker has raised an impressive $180M in venture funding. Many expect them to reach unicorn status if they go public.

Ansible – Class of 2012

Acquired by Red Hat in October 2015

Managing a large number of servers on site or in the cloud can be a complex, time-consuming task, but Michael DeHaan, founder of Ansible, didn’t think it had to be that way.

“System managers shouldn’t have to worry about lots of complicated syntax,” he said. With a simpler approach to system orchestration, part-time sys-admins can do what they need to do, getting in and out quickly. Apparently Red Hat agreed and acquired Ansible in October of 2015.

Bootstrap – Class of 2011

Ubiquitous toolkit for responsive websites

Do you remember the dark days when most websites were designed and built to look great on a desktop monitor, but many of them were practically unusable if viewed on the small screen of a tablet or mobile phone? Mobile visitors now account for the majority of traffic on many websites so it’s important that your website be “responsive,” adapting to the different screen sizes while remaining usable and engaging. Bootstrap, a toolkit originated by Twitter, has become the foundation of many responsive websites, with base CSS and HTML for typography, forms, buttons, tables, grids, navigation and more.

NuGet – Class of 2011

Universal package manager for .NET development

NuGet is a free, open source developer-focused package management system for the .NET platform designed to simplify the process of incorporating third party libraries into a .NET application during development. Originally developed by developers from Microsoft and the .NET Foundation, it should come as no surprise that it has become a standard component of the development platform in many Windows-based software development environments.NuGet is now pre-installed as part of current versions of Microsoft Visual Studio.

OpenStack – Class of 2010

Orchestration Framework for the World’s Largest Clouds

Originally developed as a collaboration between RackSpace Hosting and NASA, OpenStack is an open source, open standards platform for large scale cloud computing. Since 2010, OpenStack has grown tremendously and gained active support from over 500 companies, including industry giants like Oracle, HP, and Cisco. Many of the world’s largest clouds are build using OpenStack. If you use any cloud-based applications or services, it’s almost certain that some of them are running on OpenStack.

By any measure, that’s a pretty impressive list. Are there any projects launching this year that will have a similar impact on the software development industry? History suggests yes, and maybe it’s a project you are working on? If so, make sure it gets noticed by registering it on OpenHub. Maybe you too can join this illustrious group of Rookies turned All Stars!