Post navigation

It has been a very long journey, from the initial adoption of PolyGerrit at GerritHub to the epic moment where Gerrit historic GWT was dropped with the Gerrit v3.0 last month.

GerritHub.io has always been aligned with the latest and greatest of Gerrit Code Review and thus the moment has come for us to upgrade to v3.0 and drop forever the GWT UI.

PolyGerrit vs. GWT adoption

The PolyGerrit UX was pretty much experimental until the beginning of 2018: the features were incomplete and people needed to go back to the old GWT UI for many of the basic use-cases.

However, things started to change radically in April 2018 when GerritHub.io adopted Gerrit v2.15 which had a 100% functionally complete PolyGerrit UI. The number of users choosing PolyGerrit jumped from 10% to 35% (3.5x times) with a +70% growth in the number of accesses overall. That means that the adoption was mainly driven by users attracted by the new UI.

In the past 12 months, PolyGerrit became the default user-interface and was just renamed as Gerrit UI. Gradually more and more users abandoned the old GWT interface that now represents 30% of the overall accesses.

Timeline of the upgrade

For the 70% of people that are using already using the new Gerrit UI, the upgrade to Gerrit v3.0 would not be noticeable at all:

Can I still use GWT with GerritHub.io?

The simple answer is NO: Gerrit v3.0 does not contain any GWT code anymore and thus it is impossible for GerritHub.io to bring back the old UI.

The journey to fill the gaps and reach 100% feature and functional equivalence between the old GWT and the new Polymer-based UI took around 6 years, 18k commits and 1M lines of code written by 260+ contributors from 60+ different organizations. It has been tested by hundreds of thousands of developers across the globe and is 100% production-ready and functionally complete.

If you fully endorse GerritHub.io with Gerrit v3.0 and start using the new UI, the service will continue to be FREE for public and private repositories, organizations of all types and size. You can optionally purchase Enterprise Support from one of our plans if you require extra help in using and configuring your Gerrit projects with your tools and organization.

It can be downloaded from www.gerritcodereview.com/3.0.html and installed on top of any existing Gerrit v2.16/NoteDb installations. Native packages have been distributed through the standard channels and upgrading is as simple as shutting down the service, running the Rpm, Deb or Dnf upgrade command and starting again.

You can also try Gerrit v3.0 using Docker by simply running the following command:

This article goes through the whole history of the Gerrit v3.0 development and highlights the differences between the previous releases.

Milestone for the Gerrit OpenSource Project

Finally, after 6 years, 18k commits and 1M lines of code written by 260+ contributors from 60+ different organizations, Gerrit v3.0 is finally out.

The event is a fundamental milestone for the project for two reasons:

The start of a new journey for Gerrit, without the legacy code of the old GUI based on Google Web Toolkit and without any relational database. Gerrit is now fully based on a Git repository and nothing else.

The definition of a clear community organization, with the foundation of a new Engineering Steering Committee and the role of Community Manager.

The new structure will drive the product forward for the years to come and will help to define a clear roadmap to bring back Gerrit at the center of the Software Development Pipeline.

Evolution vs. revolution

When a product release increments the first major number, it typically introduces a series of massive breaking changes and, unfortunately, a period of instability. Gerrit, however, is NOT a typical OpenSource product, because since the beginning it has been based on rigorous Code Review that brought stability and reliability from its initial inception back in 2008. Gerrit v3.0 was developed during the years by following a rigorous backward compatibility rule that has made Gerrit one of the most reliable and scalable Code Review systems on the planet.

For all the existing Gerrit v2.16 installations, the v3.0 will be much more similar to a rather minor upgrade and may not even require any downtime and interruption of the incoming read/write traffic, assuming that you have at least a high-availability setup. How is this possible? Magic? Basically, yes, it’s a “kind of magic” that made this happen, and it is all thanks to the new repository format for storing all the review meta-data: NoteDb.

Last but not least, all the feature that Gerrit v3.0 brings to the table, have been implemented iteratively over the last 6 years and released gradually from v2.13 onwards. Gerrit v3.0 is the “final step” of the implementation that fills the gaps left open in the past v2.16 release.

In a nutshell, the Gerrit code-base has shrunk of 10k lines of code, compared to v2.16. So, instead of talking of what’s new in v3.0, we should instead describe what inside the 72k lines removed.

Removal of the GWT UI

The GWT UI, also referred to as “Old UI” has been around since the inception of the project back in 2008.

Back in 2008, it seemed a good idea to build Gerrit UI on top of GWT, a Web Framework founded by Google two years earlier and aimed at reusing the same Java language for both backend and the Ajax front-end.

However, starting in 2012, things started to change. The interest of the overall community in GWT decreased, as clearly shown by the StackOverflow trends.

commit ba698359647f565421880b0487d20df086e7f82a
Author: Andrew Bonventre <andybons@google.com>
Date: Wed Nov 4 11:14:54 2015 -0500
Add the skeleton of a new UI based on Polymer, PolyGerrit
This is the beginnings of an experimental new non-GWT web UI developed
using a modern JS web framework, http://www.polymer-project.org/. It
will coexist alongside the GWT UI until it is feature-complete.
The functionality of this change is light years from complete, with
a full laundry list of things that don't work. This change is simply
meant to get the starting work in and continue iteration afterward.
The contents of the polygerrit-ui directory started as the full tree of
https://github.com/andybons/polygerrit at 219f531, plus a few more
local changes since review started. In the future this directory will
be pruned, rearranged, and integrated with the Buck build.
Change-Id: Ifb6f5429e8031ee049225cdafa244ad1c21bf5b5

The PolyGerrit project introduced two major innovations:

Gerrit REST-API: for the first time the interaction of the code-review process has been formalized in stable and well-documented REST-API that can be used as “backend contract” for the design of the new GUI

The PolyGerrit front-end Team: for the first time, a specific experienced Team focused on user experience and UI workflow was dedicated to rethink and redesign iteratively all the components of the Gerrit Code Review interactions.

The GWT UI and PolyGerrit lived in the same “package” from v2.14 onwards for two years, with the users left with the option to switch between the two. Then in 2018 with v2.16 the PolyGerrit UI became the “default” interface and thus renamed just “Gerrit” UI.

With Gerrit v3.0, the entire GWT code-base in Gerrit has been completely removed with the epic change by David Ostrovsky “Remove GWT UI“, which deleted 33k lines of code in one single commit.

The new Polymer-based UI of Gerrit Code Review is not very different than the one seen in Gerrit v2.16, but includes more bug fixes and is 100% feature complete, including the projects administrations and ACLs configuration.

Removal of ReviewDb

Gerrit v3.0 does not have a DBMS anymore, not even for storing its schema version as it happened in v2.16. This means that almost everything gets stored in the Git repositories.

The journey started back in October 2013, when Shawn Pearce gave to Dave Borowitz the task to convert all the review meta-data managed by Gerrit into a new format inside the Git repository, called NoteDb.

Google started adopting NoteDb in parallel with ReviewDb on their own internal setup and in June 2017, the old changes table was definitely removed. However, there was more in the todo-list: at the Gerrit User Summit 2017, Dave Borowitz presented the final roadmap to make ReviewDb finally disappear from everyone’s Gerrit server.

In the initial plans, the first version with NoteDb fully working should have been v2.15. However, things went a bit differently and a new minor release was needed in 2018 to make the format really stable and reliable with v2.16.

Gerrit v2.16 is officially the last release that contains both code-bases and allows the migration from ReviewDb to NoteDb.

Dave Borowitz used the hashtag “RemoveReviewDb” to allow anyone to visualize the huge set of commits that removed 35k lines of code complexity from the Gerrit project.

Migrating to Gerrit v3.0, step-by-step

Gerrit v3.0 requires NoteDb as pre-requisite: if you are on v2.16 with NoteDb, the migration to v3.0 is straightforward and can be done with the following simple steps:

Shutdown Gerrit

Upgrade Gerrit war and plugins

Run Gerrit init with the “batch” option

Start Gerrit

If you are running Gerrit in a high-availability configuration, the above process can be executed on the two nodes individually, with a rolling restart and without interrupting the incoming traffic.

If you are running an earlier version of Gerrit and you are still on ReviewDb, then you should upgrade in three steps:

Migrate from your version v2.x (x < v2.16) to v2.16 staying on ReviewDb. Make sure to upgrade through all the intermediate versions. (Example: migrate from v2.13 to v2.14, then from v2.14 to v2.15 and finally from v2.15 to v2.16)

Convert v2.16 from ReviewDb to NoteDb

Migrate v2.16 to v3.0

The leftover of a DBMS stored onto H2 files

Is Gerrit v3.0 completely running without any DBMS at all? Yes and no. There is some leftover that isn’t necessarily associated with the Code Review meta-data and thus did not make sense to be stored in NoteDb.

Persistent storage for in-memory caches.
Some of the Gerrit caches store their status on the filesystem as H2 tables, so that Gerrit can save a lot of CPU time after a restart reusing the previous in-memory cache status.

Reviewed flag of changes.
Represents the flag that enables the “bold” rendering of a change, storing the update status for every user. It is stored by default on the filesystem as H2 table, however, can be alternatively stored on a remote DBMS or potentially managed by a plugin.

Dave Borowitz contributed 316k lines of code on 3.6k commits over 36 repositories in 8 years. He helped also the development of the new Gerrit Multi-Site plugin by donating its Zookeeper-based implementation of a global ref-database.

On behalf of GerritForge and the Gerrit Code Review community, I would like to thank all the past contributors and maintainers that made PolyGerrit and NoteDb code-base into Gerrit: Dave, Logan, Kasper, Becky, Viktar, Andrew and Wyatt.

Jenkins and Gerrit are the most critical components of the DevOps Pipeline because of their focus on people (the developers), their code and collaboration (the code review) their builds and tests (the Jenkinsfile pipeline) that produce the value stream to the end user.

Accelerate the CI/CD pipeline

DevOps is all about iteration and fast feedback. That can be achieved by automating the build and verification of the code changes into a target environment, by allowing all the stakeholder to have early access to what the feature will look like and validating the results with speed and quality at the same time.

Every development team wants to make the cycle time smaller and spend less time in tedious work by automating it as much as possible. That trend has created a new explosion of fully automated processes called “Bots” that are more and more responsible for performing those tasks that developers are not interested in doing manually over and over again.

As a result, developers are doing more creative and design work, are more motivated and productive, can address technical debt a lot sooner and allow the business to go faster in more innovative directions.

As more and more companies are adopting DevOps, it becomes more important to be better and faster than your competitors. The most effective way to accelerate is to extract your data, understand where your bottlenecks are, experiment changes and measure progress.

By looking at the protocol and code statistics, we founded out that bots are much more hard worker than humans on GerritHub.io, which hosts, apart from the Gerrit Code Review mirrored projects, also many other popular OpenSource.

That should not come as a surprise if you think of how many activities could potentially happen whenever a PatchSet is submitted in Gerrit: style checking, static code analysis, unit and integration testing, etc.

We also noticed that most of the activities of the bots are over SSH. We started to analyze what the Bots are doing and see what the impact is on our service and possibly see if there are any improvements we can do.

Build integration, the wrong way

GerritHub has an active site with multiple nodes serving read/write traffic and a disaster recovery site ready to take over whenever the active one has any problem.

Whenever we roll out a new version of Gerrit, using the so-called ping-pong technique, we swap the roles of the two sites (see here for more details). Within the same site, also, the traffic can jump from one to the other in the same cluster using active failover, based on health, load and availability. The issue is that we end up in a situation like the following:

The “old” instance still served SSH traffic after the switch. We noticed we had loads of long-lived SSH connections. These are mostly integration tools keeping SSH connections open listening to Gerrit events.

Long-lived SSH connections have several issues:

SSH traffic doesn’t allow smart routing. Hence we end up with HTTP traffic going on the currently active node and most of the SSH one still on the old one

There is no resource pooling since the connections are not released

There is the potential loss of events when restarting the connections

That impacts the overall stability of the software delivery lifecycle, extending the feedback loop and slowing your DevOps pipeline down.

Then we started to drill down into the stateful connections to understand why they exist, where are they coming from and, most importantly, which part of the pipeline they belong to.

Jenkins Integration use-case

The Gerrit Trigger plugin for Jenkins is one of the integration tools that has historically been suffering from those problems, and unfortunately, the initial tight integration has become over the years less effective, slow and complex to use.

Configuring the Gerrit Trigger Plugin is more error-prone because requires a lot of parameters and settings.

Systems dependencies

Tightly Coupled with Gerrit versions and plugins.

Uses Gerrit as a generic Git server, loosely coupled.

Upgrade of Gerrit might break the Gerrit Trigger Plugin integration.

Gerrit knowledge

Admin: You need to know a lot of Gerrit-specific settings to integrate with Jenkins.

User. You only need to know Gerrit clone URL and credentials.

The Gerrit Trigger plugin requires special user and permissions to listen to Gerrit stream events.

Fault tolerance to Jenkins restart

Missed events: unless you install a server-side DB to capture and replay the events.

Transparent: all events are sent as soon as Jenkins is back.

Gerrit webhook automatically tracks and retries events transparently.

Tolerance to Gerrit rolling restart

Events stuck: Gerrit events are stuck until the connection is reset.

Transparent: any of the Gerrit nodes active continue to send events.

Gerrit trigger plugin is forced to terminate stream with a watchdog, but will still miss events.

Differentiate actions per stage

No flexibility to tailor the Gerrit labels to each stage of the Jenkinsfile pipeline.

Full availability to Gerrit labels and comments in the Jenkinsfile pipeline

Multi-branch support

Custom: you need to use the Gerrit Trigger Plugin environment variables to checkout the right branch.

Native: integrates with the multi-branch projects and Jenkinsfile pipelines, without having to setup anything special.

Gerrit and Jenkins friends again

After so many years of adoption, evolution and also struggles of using them together, finally Gerrit Code Review has the first-class integration with Jenkins, liberating the Development Team from the tedious configuration and BAU management of triggering a build from a change under review.

Jenkins users truly love using Gerrit and the other way around, friends and productive together, again.

Conclusion

Thanks to Gerrit DevOps Analytics (GDA) we managed to find one of the bottlenecks of the Gerrit DevOps Pipeline and making changes to make it faster, more effective and reliable than ever before.

In this case, by just picking the right Jenkins integration plugin, your Gerrit Code Review Master Server would run faster, with less resource utilization. Your Jenkins pipeline is going to be simpler and more reliable with the validation of each change under review, without delays or hiccups.

The Gerrit Code Review plugin for Jenkins is definitively the first-class integration to Gerrit. Give it a try yourself, you won’t believe how easy it is to set up.

Luca Milanesio, Gerrit Code Review Maintainer and Release Manager, presented the current status of the support for multi-master and multi-site setups with the standard OpenSource Components, developed by GerritForge and the Gerrit Code Review Community.

Introduction

The focus of this talk is sharing with you one experience that we did with the Gerrit server that we maintained, GerritHub.

First of all, I’m just going to tell you how we went through the journey from a single master-slave installation back in 2013 to a fully multi-site setup across two continents.

The evolution of GerritHub to multi-site

GerritHub was born in November 2013. The idea was straightforward. It was just an idea on how to take a single Gerrit server and put the replication plug-in to push to GitHub.

To implement a good and scalable and reliable architecture, you don’t need to design everything up front. At the beginning of your journey, you don’t know who your users are, how many repos are going to create, what the traffic looks like, what the latency looks like: you know nothing.

You need to start small, and we did back in 2013, with a single Gerrit master located in Germany, because we had no idea of where the users would have come from.

Would the people in Europe like it, or rather the people in the U.S. like it, or again the people in China like it? We did not know. So we started with one in Germany.

Because we wanted to make a self-service system what we did was very simple: a simple plugin called, “The GitHub plug-in”. That was just a wizard to add an entry in the replication config.

You have Gerrit incoming traffic, then you configure replication, plugin and eventually push to GitHub. The only complicated part here is that if you do it as a Gerrit administrator you have to define these remotes in the replication.config but you can express it in an optimized way. On a self-service system, you’ve got 1000s of people then will create 1000s remotes automatically. Luckily, the replication plugin works very well and was able to cope with it very well.

Moving to Canada

Then we evolved. The reason why we changed is that people started saying ‘Listen, GerritHub is cool, I can use it, works well in the morning. Why in the afternoon is so slow?“. Uh oh.

We needed to do some data mining to see precisely who was using it, where they were coming from, and what operations they were doing. Then we realized that we had chosen the wrong location because we decided that we wanted to put the Gerrit master in Germany, but the majority of people are coming from the USA.

Depending on how the backbone between the Atlantic Ocean was performing, GerritHub could be faster or slower. One of the complaints that they were saying is that in the morning, GitHub was slower than GerritHub, but in the afternoon it was exactly the opposite.

We were doing some performance tuning and analyzing the traffic, and even when people were saying that it’s very slow, actually GerritHub was a lot faster than GitHub in terms of throughput. The problem was the number of hops between the end user and GerritHub.

We decided that we needed to move from Germany to the other side of the Atlantic Ocean. We could have done to move the service to the USA but we decided to go with Canada because the latency was precisely the same as hosting in the USA but less expensive.

What we could have done is just to move the Master from Germany to the other side of the Atlantic Ocean, but because, from the beginning, we wanted to give a service that is always available, we decided to keep both zones.

We didn’t want to have any downtime, even in this migration. We wanted, definitely, to do one step at a time. No changes in releases, no changes in configurations, only moving stuff around. Whenever you change something, even if it’s a small release change, you change the function, and that has to be properly communicated.

If we were changing data center and version, when something goes wrong, you would have the doubt of what it is. Would it be the new version that is slower or the new data center that is slower? You don’t know. If you change one thing at a time, it must be that thing that wasn’t working.

We did the migration in two steps.

Step-1: The Gerrit master in Germany, still the replication to GitHub, and the new master in Canada was just one extra replication end.

The traffic was still coming on this side of the master, but it was replicated to both Canada and the other GitHub. Then, when that was stable, so we were doing all the testing, the other master was used as it was at Gerrit slave, but was not a slave, all the nodes were master, with just a different role.

Step-2: Flip the switch the Gerrit master is in Canada. When replication was online and everything was aligned, we have put a small read-only plug on the German side, which was making the whole node read-only for a few minutes, to give time to the last replication queue to drain.When the replication queue was drained, we flipped the switch, when it was going to the new master it was already read/write.

The people didn’t change their domain, didn’t notice any difference, apart from the much-improved performance. The feedback was like “Oh my, what have you done to GerritHub since yesterday? It’s so much faster. It has been never so fast before.‘
Because it was the same version, and we were testing in parallel for quite some time, nobody had a single disruption.

Zero-downtime migration leveraging multi-site

But that was not enough, because we wanted to keep Gerrit always up and running and always up to date with the latest and greatest version. Gerrit is typically released twice a year; however, the code-base is stable in every single commit.
However, we were still forced to do the ping pong between the two data centers when we were doing our roll out. It means that every time that an upgrade was done, users had a few minutes of read-only state. That was not good enough, because we wanted to release more frequently.

When you upgrade Gerrit within the same release, let’s say between 2.15.4 and 2.15.5, the process is really straight forward, because you just replace the .war, restart Gerrit, done.

However, If you don’t have at least two nodes on either side, you need to ping pong between the two different data centers, then apply the read-only window, which isn’t great.
We started with a second server on the central node so each node can deal with the entire traffic of GerritHub. We were not concerned about the German side, because we were just using it as disaster recovery.

Going multi-site: issues

We started doubling the Canadian side with one extra server. Of course, if you do that with version v2.14 which problem do you have?

Sessions. So, how do you share the sessions? If you login into one Gerrit server, you create one session, then you go to the other and you don’t have have a session anymore.

Caches. That is easier to resolve, you just put the TTL of the cache to a very low value, put some stickiness, you may sort this out. However, cache consistency is another problem and needs to be sorted.

Indexes are the very painful one, because, at that time there was no support for ElasticSearch. Now things are different, but back in 2017, it wasn’t there.
What happens is that every single node has its own index. If an index entry is stale, it’s not right until someone is going to re-index.

The guys from Ericsson were developing a high availability plugin. We said instead of reinventing the wheel, why don’t we use a high availability plugin? We started rolling it out to GerritHub and actually, the configuration is more complex and looks like this one.

So, imagine you’ve got in Canada you’ve got two different masters, still only one in Germany. They align the consistency of the Lucene index and the cache through the HA plugin and they still use the replication plugin.

How do you share the repository between the two? You need to have a shared file system. We use what exactly the same used by Ericsson, NFS.

Then, for exposing the service we needed HAProxy, not just one but at least two. If you put one HAProxy, you’re not HA anymore, because if that HA proxy dies, your service goes down. So, you have two HAProxy, they must have a cross configuration’s, it means that both of them, they can redirect traffic to one master or the second master, it’s not one primary and the second backup: they have exactly the same role. They do exactly the same thing, they contain exactly the same code, they’ve got exactly the same cache, exactly the same index. They’re both running at the same time. They’re both accepting traffic.

This is something similar to what Martin Fick (Qualcomm) did, I believe, last year, with the only difference that they did not use HAProxy but only DNS round-robin.

Adoption of the high-availability plugin

Based on the experience of running this configuration on GerritHub, we started contributing a lot of fixes to the high-availability plugin.

A lot of people are asking “Oh, GerritHub is amazing. Can you sell me GerritHub?”. I reply with “What do you mean exactly?”

GerritHub is just a domain name that I own, with Gerrit 2.15, plus a bunch of plugins: replication plugin, GitHub plugin, the high-availability plugin (we use a fork), the web session flat file and a bunch of scripts to implement the health check.

If you guys want to do your own, the same configuration, you don’t need to buy any commercial product. We don’t sell commercial products. We just put the ideas into the OpenSource community to make it happen. Then if you need Enterprise Support, we can help you implement it.

The need for a Gerrit disaster-recovery site

Then we needed to do something more because we had one problem. OVH had historically been very reliable, but, of course, shit happens sometimes.
It happened one day that OVH network backbone was down for a few hours.

That means that any server that was on that Data-Center was absolutely unreachable. You couldn’t even connect to them, you couldn’t even check their status, zero. We turned the traffic to the disaster recovery side, but then we faced a different challenge because we had only one master.
It means that if something happens to that master, I don’t know, a peak, or whatever, is becoming a little bit unhealthy, then we are going to have an outage. We didn’t want to risk to have an outage in that situation.

Your disaster recovery side is never really safe until you are going to need and use it. Then use it all the times, on a regular basis. This is what we ended up to implement.

We have now two different data centers, managed by two different cloud providers. One is still OVH with Canada, and the other is Hetzner in Germany. We still have the same configuration, HA plugin over a shared NFS, so this one is completely replicated into the disaster recovery site, and we are using the disaster recovery continuously to make sure that is always healthy and aligned.

Leverage Gerrit-DR site for Analytics

Because we didn’t want to serve actual user traffic on the disaster recovery site, because of the synchronization lag between the two sides, we ended up using for all the data mining activities. There is a lot of things that we do on data, trying to understand how our system performs. There is a universe of data that typically either never looked at or you don’t really extract and process in the right way.

Have you ever noticed how much data Gerrit generates under the logs directory? A tremendous amount of data and that data tells you exactly the stories that you want to know. It tells you exactly how Gerrit works today, what I need to do to makes sure that Gerrit will work tomorrow, how functions are performing for the end users, if something has blown up, it’s all there.

We started really long ago to work on that DevOps Analytics space, and we started providing that metrics and insights data for the Gerrit Code Review project itself and reporting it back to the Gerrit Code Review project through the service https://analytics.gerrithub.io.

Therefore we started using the disaster recovery site for analytics traffic because if I do an extraction and processing of my data on my logging, on my activities, is there really a need for an analysis of the visit of 10 seconds ago or not? A small time lag on data doesn’t make any difference from an analytics’s perspective.

We were running Gerrit v2.15 here, so the HA plugin needed to be radically different from the one that it is today. We are still massively on the HA plugin fork, but all the changes have been pushed for review on the high availability.

However, the solution was still not good enough because there were still some problems. The problem is that the HA plugin within the same data center relies on the shared file system. We knew within the same data center that was not a problem. But, what about creating an NFS across data-centers in different continents? It would never work effectively because of the latency limitations.

We then started with a low tech solution: rely on the replication plugin for the Git data in the repository. Then every 30 minutes, there was a cronjob that was checking the consistency between the two and then does a delta re-index between the different sites.

Back in Gerrit v2.14, I needed to do as well a database export and import, because contained the reviews.

But, there was also the timing problem: in case of a disaster occurred, people will have to wait for half an hour to get the data after re-index.
They would have not lost anything because the index can be recreated at any time, but the user experience was not ideal. And also, you’ve got the DNS related issues for going from one zone to the other.

Sharding Gerrit across sites

First of all, we want to leverage the sharding based on the repository, available from Gerrit 2.15, which include the project name in each page or REST-API URLs. That allows achieving basic sharding across different nodes even with a simple OpenSource HAProxy or other Workload Balancer, without having the magic of the Google intelligent Gerrit/Git proxy. Bear in mind this is not pure sharding, because all the nodes keep on having all the repositories available on every node. HAProxy is going to be clever and based on the project name and action, will use the most appropriate backend for the reads and writes operation. In that way, we are making sure that you never push on the same repository on the same branch from two different nodes concurrently.

How magic works? The replication plugin takes care of synching the nodes. Historically with master-slave, the synchronization was unidirectional. With multi-site instead, all masters are replicating to each other. It means that the first master will replicate to the second and the other way around. Thanks to the new URL scheme, Google made our lives so much easier.

Introducing the multi-site plugin

We have already started writing a brand-new plugin called multi-site. Yes, it has a different name, because the direction that it’s taking is completely different in terms of decisions compared to the high-availability plugin, which required having a shared file system. Secondly, the HTTP synchronous communication with the other site was not feasible anymore. What happens if, for instance, a remote Site in India is not reachable for 50 seconds? I still want things events and data to be put into a persistent queue to be eventually sent through the remote site.

For the multi-site broker implementation, were have decided to start with Kafka, because there is already an implementation of the stream events it in Gerrit as a plugin. The multi-site plugin, however, won’t be limited to Kafka but could be extended to support other popular brokers such as NATS.

We will go to the location-aware DNS because we want the users to access the HAProxy that is closer to him. If I am living in Germany maybe it makes sense for me to use only the European servers, but definitely not the servers in California.

You will go by default to a “location-aware” site that is closer to you. However, depending on what you do, you could still be laster on redirected to another server across zones.

For example, if I want to fetch data, I can still fetch everything on my local site. But, if I want to push data, the operation can either be executed locally or forwarded to the remote site, if the target repository has been sharded remotely.

The advantage is the location-awareness and data-locality. The majority of data transfer will happen with my local site because, at the end of the day, the major complaints of people using Gerrit with remote masters is a sluggish GUI.

If you have Gerrit master/slave, for instance, a Gerrit master in San Francisco and your Gerrit slaves in India, you have the problem that everyone from India will still have to access a remote GUI from the server in San Francisco, and thus would experience a very slow GUI.

Even if only 10 percent of your traffic is a write operation, you would suffer from all the performance penalties of using a remote server in San Francisco. However, if I have an intelligent HAProxy, with also an intelligent logic in Gerrit that understands where the traffic needs to go to, I can always talk to my local server is in my zone. Then, depending on what I do, I can use my local server or a remote server. This happens behind the scenes and is completely transparent to the user.

Q&A

Q I just wanted to ask if you’re ignoring SSH?

SSH is a problem in this architecture, right, because HAProxy supports SSH, but has a big problem.

Q: You don’t know what projects, you don’t have any idea of the operation, whether it’s read or write.

Exactly. HAProxy works at transport levels, so it means that it knows that flow of encrypted data going back and forth to the Gerrit server, but cannot see anything in clear and thus cannot make an educated decision based on it. Your data may end up being forwarded to the zone where your traffic is not optimized, and you will end up waiting a little bit more. But, bear in mind, that the majority of complaints about Gerrit master/slave is not really on the speed of the push/pull but rather on the sluggish Gerrit GUI.

Of course, we’ll find as a Community a solution to the SSH problem, I would be more than happy to address that in multi-site as well.

Q: Solution is to get a better LAN because we don’t have any complaints about the GUI across the continents.

Okay, yeah, get a super fast network across the globe, so everyone and everywhere in the world will be super fast accessing a single unique central point. However, not everyone can have a super-fast network. For instance in GerritHub, we don’t own the infrastructure so we cannot control the speed. Similarly, a lot of people are adopting a cloud infrastructure, where you don’t own and control the infrastructure anymore.

Q: Would you also have some Gerrit slaves in this setup. Would there be several masters and some of the slaves? Or everything will become just a master?

In multi-site architecture, everything can be a master. The distinction between master and slave does not exist anymore. Functionally speaking, some of them will be master for certain repositories and slave for other repositories. You can say that every single node is capable of managing any type of operations. But, based on this smart routing, any node can potentially manage everything, but effectively forwards the calls based on where is best to execute them.
If with this simple multi-site solution you can serve the 99% of the traffic locally, and for that one percent, we accept the compromise of going a bit slower, than it could still sound very good.

We implement this architecture last year with a client that is distributed worldwide and it just worked, because it is very simple and very effective. There isn’t any “magic product” involved but simply standard OpenSource components.

We are on a journey, because as Martin said last year in his talk, “Don’t try to do multi-master all at once.” Every single situation is different, every client is different, every installation is different, and all your projects are different. You will always need to compromise, find the trade-off, and tailor the solution to your needs.

Q: I think in most of the cases that I saw up there, you were deploying a Gerrit master in multiple locations? If you want to deploy multi-master in the same sites, like say I want two Gerrit servers in Palo Alto, does that change the architecture at all? Basically, I am referring to shared NFS.

This is actually the point made by Martin: if you have the problem on multi-site, it’s not a problem of Gerrit, it’s a problem of your network because your network is too slow. You should solve the network, you shouldn’t solve this problem in Gerrit.
Actually, the architecture is this one. Imagine you have a super fast network all across the globe, everyone is reaching the same Gerrit in the same way. You have a super fast direct connection with your shared NFS, so you can go and grow your master and scale horizontally.

The answer is, for instance, yes, if your company can do that. Absolutely, I wouldn’t recommend doing multi-site at all. But if you cannot do anything about speed, and you got people that work remotely and unfortunately, you cannot go and put a cable between the U.S. and India, under the sea just for your company, then maybe you want to address the problem in a multi-site fashion way.

One more thing that I wanted to point out, is that the multi-site plugin makes sense even in the same site. Why this picture is better than the previous one? I’m not talking about multi-site, I’m talking about multi-master on the same data center. Why this one is better?

You’ll read and write into both. Read, write, traffic to both. The difference between this one and previous is that there is no shared file system on this one. The previous, even within the same data center, if you use a shared file system, you still have a single point of failure. Because, even if you are willing to buy the most expensive shared file system with the most expensive network and system that exists in the world, they will still fail. Reliability cannot be simply resolved by throwing more money on it.
It’s not the money that makes my system reliable. Machines and disks will fail one day or another. If you don’t plan for failures, you’re going to fail anyway.

Q: Let’s say when you’re doing an upgrade on all of your Gerrit masters right? Do you have an automatic mechanism to upgrade? I’m coming from a perspective where we have our Gerrit on Kubernetes. So, sure, if I’m doing a rolling upgrade, it’s kind of like a piece of cake, HAProxy is taking care of by confd and all that stuff, right?In your situation, how does an upgrade would look like, especially if you’re trying to upgrade all the masters across the globe? Do you pick and choose which site gets upgraded first and then let the replication plugins catch up after that?

First of all, the rolling upgrade with Kubernetes will work if you do minor upgrades. If you’re doing major Gerrit upgrades, it doesn’t. This is because that changes the data, right? With the minor upgrades, you do exactly the same here. Even if the others in the other zones have a different version, they are still interoperable, right, because they talk exactly to the same schema, exactly to the same API, exactly in the same way.

Q: I guess taking Kubernetes out of the picture, even, so with the current multi-master setup you have, if you’re not doing just a trivial upgrade, how do you approach that usually?

If you are doing a major version upgrade, that is going to orchestrate exactly like the GerritHub upgrade from 2.14 to 2.15. You need to use what I called a ping-pong technique. You basically need to do across data centers.

It’s not fully automated yet. I’m trying to automate it as much as possible and contribute back to the community.

Q: In the multi-master, when you’re doing the major upgrade, and even in the ping pong, so if the schemas changed, and you’re adding the events to replication plugin, are you going to temporarily suspend replication during the period of time? Because the other items on the earlier version don’t understand that schema yet. Can you explain that a little?

Okay, when you do the ping pong that was this morning presentation, what happens is the upgrade on the first node, interruption the traffic there, you do all the testing you want, you are behind with the original master but it catches up with replication and does all the testing.

With regards to the replication event, they are not understood by the client, such as the Jenkins Gerrit trigger plugin. That point was raised this morning as well. If you go to YouTube.com/GerritForgeTV, there is the recording of my talk of last year about a new plugin that is not subject to this fluctuation of Garret version.

This week we have published the recording of David Pursehouse (CollabNet) talk about Gerrit v2.14 and v2.15. Even though the releases are not new, there are many improvements were made recently, especially regarding the support for Elasticsearch as an engine to store the Gerrit secondary indexes.

Introduction

Hello, I am David Pursehouse, Gerrit Maintainer and release manager.

So, I’m going to talk about 2.14. Actually, this is the same presentation as I already gave last year. So, some of you may have seen it already. Although, I have made … how do I just … I have made some changes.

Basically, 2.14 isn’t new anymore. it’s already 18 months old, but even last year when I presented this, it was already six months old. And it has been in constant development since then. Even until this month we’ve been adding new stuff or making new releases.

And I’ll also touch a bit on 2.15 because yesterday evening we realized that there isn’t a 2.15 talk in the schedule today or tomorrow. So, I will briefly cover 2.15.

What’s new in v2.14

A number of new features in 2.14. I won’t read them all out because I’ve got separate slides for them, but there are some new features as well, which I’ll talk a bit more about later, which were not in 2.14 last year.

Some notes on important things about upgrading to 2.14, and actually the same things apply if you are upgrading to 2.15, 2.16. We require Java 8, so I assume most of you are already on Java 8 at least, so that shouldn’t be an issue, but these things are things that people run into when they’re doing the upgrades on the mailing list and issue tracker. So, these are the things that people have stumbled on that you should know about.

There’s no more HTTP digest authentication. This was basically because of moving to the accounts to NoteDb and not being able to store them in NoteDb. If you upgrade, it will get migrated. I think there were some issues with this initially, but they’ve been fixed in the bug fix releases. And the main thing for users here is that you can’t see your password in the UI anymore. It doesn’t display it. If you want a new password, you have to reset it, and then it disappears.

Also, there’s a new secondary index for groups, and you need to reindex that before you start the server. Otherwise your groups don’t work, and basically, Gerrit doesn’t work. And there are more details about this in the release notes.

Speaking of which, probably you have already seen this already because we’ve been sending out emails with the links, but we’ve moved the release notes to the new homepage, which is on www.gerritcodereview.com. When I wrote this last year, it was the case that the change would go live immediately after we submit them, but we’ve changed the hosting, and that isn’t the case anymore. Luca was going to fix that, but I don’t think he has. No. That will get fixed soon I hope.

New features in v2.14.

We move onto the actual new features. These are not in any particular order. It’s just a brain dump of the new stuff. Changes can now be assigned to people, and you can search for them with the new operators and make it highlighted. And that’s basically it. There’s no workflow around this. Gerrit doesn’t enforce any workflow for assigned changes. You can use it as you like, although I think there’s been some discussion about changing that to make it a more defined workflow so that people are more aware of what assignable changes are. There’s confusion among people about what does it mean when a change gets assigned to somebody. And right now it means nothing. There’s no enforcement of that in Gerrit.

Changes can now be deleted if they’re open or abandoned. This is kind of similar to previously with draft changes. You could delete a draft change, but now you can delete any change that’s basically not merged. And that can be done by the administrator or the owner of the change, assuming they’ve been given the capability. And this basically deletes the change completely. It deletes the ref. It deletes the metadata. And you can’t delete merged changes for obvious reasons.

The next one is reviewer suggestions. If you click on the box to add a new review, it will suggest based on previous users who have already reviewed your changes. And it can also be filtered. Also, there’s an extension point. Plugins can inject suggestions, although I’m not sure if any plugins actually implement that, but it’s there.

New email templates. Now we’re using a different framework for templating. It supports HTML, and the previous templating engine is going to be removed, actually is removed in 2.16. Still present in 2.14, 2.15, but deprecated.

Inbound email. This is a feature that was implemented by Patrick at Google. If you get a notification of a review from Gerrit, you can reply to that, and it will add the comments on the change. And if you click this link, you get more details about that.

Tagged comments. This is a way that enables you to filter the comments in the UI according to whether they’re by humans or robots, in this case meaning CI. So, for example, on the example here, you can see that all the CI comments are not shown on the right side. I think this only works in the new UI. I’m not … Is that right? I don’t know. Maybe it works in both UIs. I’m not sure.

So, in the new UI in 2.14 there’s been a lot of improvements compared to previously, and it’s basically usable for most of the common things that you want to do like reviewing changes, but there’s still a lot missing. A lot of the admin pages are missing. And you can switch between the two. So, if anything’s missing in PolyGerrit, you go to GWT and do it there. And if you look at this talk that’s linked here, it will give you an overview of the development of PolyGerrit, the idea behind it and so on.

The merge list is a dynamically generated list of commits that are going to be included by a merge commit, and it appears in the UI as a file that you can review, which is really useful if the commit author didn’t use a log option on the merge, so you can see what really is coming in by that merge and you can maybe catch if someone’s done it against the wrong parent or something. This isn’t actually a real file. It doesn’t get included in your commit. It’s just generated on the fly by the server.

Support for Elasticsearch as a secondary index

This is new since the last time I did this talk. And I want to thank Marco and David, who are the guys that really made this work. A lot of work went in by these guys to get Elasticsearch working. This is included since 2.14.8 with minimal support for Version 2 of Elasticsearch. And then there are a few maintenance releases which add support for five and six. If you look at this link here, you’ll see the more detailed list of compatibility between Gerrit and Elasticsearch. There actually was this week a new version of Elasticsearch, which will not be supported in 2.14 as far as planned. I’ve got a change up for review that will support that in 2.15.8, whatever the next one is. And I know Elasticsearch 7 is coming soon, but we haven’t started working on that yet. If we do, it’ll be in 2.15.whatever.

This is another new one since the last presentation. This object size limit is an option in Gerrit that’s been there for a long time. It allows you to prevent people uploading big files, but the problem with that was it was kind of weird to configure it. You couldn’t configure it on a project and have it inherited to the children projects. It was either global or per project. So, now you can configure this and have it inherited, but we made it so it’s disabled by default, so it doesn’t mess up anyone’s existing configurations.

And there was one minor update of LFS in 2.14.1. Actually, we added LFS in 2.13, but then we added support for the file locking. And the support for the Amazon S3 and File-system isn’t new, but previously there were separate plugins, and now we’ve rolled them together into the one single LFS plugin.

Overview of new features in v2.15

And now finally just a brief overview of what’s in 2.15. I’ve taken this from Dave’s talk from last year, but he had a lot of details about NoteDb, which I’m not really qualified to talk about, so I’ll kind of gloss over it a bit. There is a lot more PolyGerrit UI functionality, but it’s still not equal with GWT. You need to wait for 2.16 to get that. The support for NoteDb allows having some things that were not available before. The separation of reviewer and CC in the changes with a history of adding reviewers. Hashtags, actually, were introduced a long time ago, I think in 2.13, but required NoteDb, so it couldn’t really be used. The robot comments, which is basically CI can inject comments into the changes and suggest changes. You can also ignore a change, which means it will just not appear in your dashboard. You can mark it as reviewed, so that it will not be highlighted. There is also a really improved experience in diff between rebased patch sets, and again, in Dave’s presentation from last year, there is a lot more detail about that. So, if you want to know more, you can look at that.

We have a new change workflow. The draft changes are removed and replaced by private and work in progress, and David will talk about that this morning. And there’s a lot of other stuff that I kind of missed, but if you want to see that, you can look at Dave’s talk from last year, which is linked here.

Q&A

Q: We recently migrated to 2.14, and we often see the index and losing the indexing being really slow. How do we diagnose those issues, the indexing issues? I don’t see very good documentation on this.

A: If you have specific problems or issues and you have data to support that, you can submit a bug report or you can submit any questions to the mailing list. I don’t really know if there’s any way you can tweak that to make it perform better, but if you have specific questions, we can take them on the mailing list or in the issue tracker.

Q: Gerrit indexing works very badly when we have close to 2000 open changes. Is there a specific configuration to improve the performance? We are using v2.14.10.

A: I don’t know if there are any improvements from 10 up to the latest patch release. One thing about this 2.14 is we have made a much larger number of maintenance releases than previous releases, mainly because of the Elasticsearch support, but also there have been some pretty serious things fixed in these releases, so you probably should check the release notes and see if anything related to your issue is mentioned after 2.14.10.

Q: You mentioned Java 8. I was wondering if any later Java would also work, or does it specifically have to be Java 8? And generally, what’s the thinking about newer versions of Java support?

A: Yeah. There is work ongoing to support later versions of Java, and David Ostrovsky will tell you about that in his presentation.

Accelerating your time to market while delivering high-quality products is vital for any company of any size. This fast pacing and always evolving world relies on getting quicker and better in the production pipeline of the products. The whole DevOps and Lean methodologies help to achieve the speed and quality needed by continuously improving the process in a so-called feedback loop. The faster the cycle, the quicker is the ability to achieve the competitive advantage to outperform and beat the competition.

It is fundamental to have a scientific approach and put metrics in place to measure and monitor the progress of the different actors in the whole software lifecycle and delivery pipeline.

Gerrit DevOps Analytics (GDA) to the rescue

We need data to build metrics to design our continuous improvement lifecycle around it. We need to juice information from all the components we use, directly or indirectly, on a daily basis:

SCM/VCS (Source and Configuration Management, Version Control System)how many commits are going through the pipeline?

Code Review
what’s the lead time for a piece of code to get validated?
How are people interacting and cooperating around the code?

Issue tracker (e.g. Jira)
how long does it take the end-to-end lifecycle outside the development, from idea to production?

Getting logs from these sources and understanding what they are telling us is fundamental to anticipate delays in deliveries, evaluate the risk of a product release and make changes in the organization to accelerate the teams’ productivity. That is not an easy task.

Gerrit DevOps Analytics (aka GDA) is an OpenSource solution for collecting data, aggregating them based on different dimensions and expose meaningful metrics in a timely fashion.

GDA is part of the Gerrit Code Review ecosystem and has been presented during the last Gerrit User Summit 2018 at Cloudera HQ in Palo Alto. However, GDA is not limited to Gerrit and is aiming at integrating and processing any information coming from other version control and code-review systems, including GitLab, GitHub and BitBucket.

Case study: GDA applied to the Gerrit Code Review project

One of the golden rules of Lean and DevOps is continuous improvement: “eating your dog food” is the perfect way to measure the progress of the solution by using its outcome in our daily life of developing GDA.

As part of the Gerrit project, I have been working with GerritForge to create Open Source tools to develop the GDA dashboards. These are based on events coming from Gerrit and Git, but we also extract data coming from the CI system, the Issue tracker. These tools include the ETL, for the data extraction and the presentation of the data.

As you will see in the examples Gerrit is not just the code review tool itself, but also its plugins ecosystem, hence you might want to include them as well into any collection and processing of analytics data.

Wanna try GDA? You are just one click away.

We made the GDA more accessible to everybody, so more people can play with it and understand its potentials. We create the Gerrit Analytics Wizardplugin so you can have some insights in your data with just one click.

What you can do

With the Gerrit Analytics Wizard you can get started quickly and with only one click you can get:

Initial setup with an Analytics playground with some defaults charts

Populate the Dashboard with data coming from one or more projects of your choice

The full GDA experience

When using the full GDA experience, you have the full control of your data:

Schedule recurring data imports. It is just meant to run a one-off import of the data

Create a production ready environment. It is meant to build a playground to explore the potentials of GDA

One click to crunch loads of data

Once you have Gerrit and the GDA Analytics and Wizard plugins installed, chose the top menu item Analytics Wizard > Configure Dashboard.

You land on the Analytics Wizard and can configure the following parameters:

Dashboard name (mandatory): name of the dashboard to create

Projects prefix (optional): prefix of the projects to import, i.e.: “gerrit” will match all the projects that are starting with the prefix “gerrit”. NOTE: The prefix does not support wildcards or regular expressions.

Date time-frame (optional): date and time interval of the data to import. If not specified the whole history will be imported without restrictions of date or time.

Username/Password (optional): credentials for Gerrit API, if basic auth is needed to access the project’s data.

Sample dashboard analytics wizard page:

Once you are done with the configuration, press the “Create Dashboard” button and wait for the Dashboard, tailored to your data, to be created (beware this operation will take a while since it requires to download several Docker images and run an ETL job to collect and aggregate the data).

At the end of the data crunching you will be presented with a Dashboard with some initial Analytics graphs like the one below:

You can now navigate among the different charts from different dimensions, through time, projects, people and Teams, uncovering the potentials of your data thanks to GDA!

What has just happened behind the scenes?

When you press the “Create Dashboard” button, loads of magic happens behind the scenes. Several Docker images will be downloaded to run an ElasticSearch and Kibana instance locally, to set up the Dashboard and run the ETL job to import the data. Here a sequence workflow to illustrate the chain of events is happening:

Conclusion

Getting insights into your data is so important and has never been so simple. GDA is an OpenSource and SaaS (Software as a Service) solution designed, implemented and operated by GerritForge. GDA allows setting up the extraction flows and gives you the “out-of-the-box” solution for accelerating your company’s business right now.

Contact usif you need any help with setting upa Data Analytics pipeline or if you have any feedback about Gerrit DevOps Analytics.

Time has come to migrate gerrithub.io to the latest Gerrit v2.16, from the outdated v2.15 we had so far. The big change between the two is the full adoption of NoteDB: the internal Gerrit groups were still kept in ReviewDb on v2.15, which forced us to keep a PostgreSQL instance active in production. This means we can finally say goodbye to the ReviewDb 👋 and eliminated yet another SPoF (Single-Point-of-Failure) from the GerritHub high-availability infrastructure.

Migrating to Gerrit v2.16 implies:

Gerrit WAR upgrade

GIT repos upgrade because of a change in the NoteDb format

Change in the database used, from PostgreSQL to H2 (for the schema_version)

Introduction of the new Projects index

The above is a quite complex process and, here at GerritForge, we executed the migration on a running GerritHub.io with 15k of active users avoiding any downtime during the migration.

Architecture

This is the initial architecture we are starting the GerritHub.io v2.15 migration from:

In this setup, we have 2 sites, one in Canada (active) and one in Germany (active for analytics and disaster recovery). The latter is aligned with the active master via replication plugin.

The HA Plugin used between the 2 Canadian nodes is a GerritForge fork enhanced with the ability to align the Lucene Indexes, Caches and Events when sharing repositories via NFS with caching enabled.

NOTE: The original High-Availability plugin is certified and tested on Gerrit v2.14 / ReviewDb only and requires the use of NFS without caching, which requires a direct fiber-channel connection between the Gerrit nodes the disks.

The traffic is routed with HAProxy to the active node. This allows us easy code migrations with no downtimes, using what we call the “ping-pong” technique between the Canadian and the German site, which is inspired by the classical Blue/Green deployment with some adjustments for the peculiarities of the Gerrit multi-site setup.

The migration pattern, in a nutshell, is composed of the following phases:

Upgrade code in Germany
The Gerrit site in Germany is used for Analytics and thus can be upgraded first with low risk associated.
German site -> passive, Canadian site -> active

Redirect traffic in GermanyOnce the site in Germany is ready and warmed up, the GerritHub users are redirected to it. GerritHub is technically serving the v2.16 user-experience to all users.
German site -> active, Canadian site -> passive

Upgrade code in CanadaThe site in Canada is put offline and upgraded as well.
German site -> active, Canadian site -> passive

Redirect traffic back to CanadaOnce the site in Canada is fully ready and warmed up, the entire user-base is redirected back.
German site -> passive, Canadian site -> active

Each HAProxy has the same configuration with a primary and 2 backups as follow:

Timeline of events – 2nd of Jan 2019

2/1/2019 – 8:00 GMT: Starting point of the GerritHub configuration

Review-1 – Gerrit 2.15 – active node

Review-2 – Gerrit 2.15 – failover node

Review-DE – Gerrit 2.15 – analytics node, used for disaster recovery

2/1/2019 – 10:10 GMT: Upgrade disaster recovery server

Stopped all services using Gerrit on review-de (we use the disaster recovery to crunch and serve the analytics dashboard)

Disabled replication plugin

Stopped Gerrit 2.15 and upgraded to Gerrit 2.16

Restarted Gerrit

2/1/2019 – 10:44 GMT: Re-enabled disaster recovery server

Re-Enabled replication from review 1…boom!

First issue: mirror option of the replication plugin was set to true, hence all the branches containing the groups on the All-Users repo been dropped from the recovery server. All the Groups were suddenly gone from the disaster recovery server

Remove mirror option in replication plugin

Re-Enabled replication from review-1…this time everything was ok!

Migration re-executed and everything was fine

2/1/2019 – 11:00 GMT: Removed ReviewDB

Once we were happy with the replication of the Groups we could remove PostgreSQL

The only information left outside NoteDB is the schema_version table, which contains only one row and it is static. We moved it into H2 by copying the DB from a vanilla 2.16 installation and changing Gerrit Config to use it.

Before the next step, we had to wait for the online reindexing on review-de to finish (~2 hours).

Note: we didn’t consider offline reindexing since it is basically sequential, and it would have been way slower compared to the concurrent online one. Additionally, it does not compute all the Prolog rules in full.

2/1/2019 – 15:15 GMT: Reduce delta between masters

Reducing the delta of data between the 2 sites (Canada and Germany) will allow having a shorter read-only window when upgrading the currently active master

Manually replicate and reindex misaligned repositories on review-de (see below the effect on the system load)

Pro tip: if you want to check queue status to see, for example, if the replication is still ongoing this command can be used:

2/1/2019 – 15:50 GMT: Final master catchup

Service degraded for few minutes (i.e.: Gerrit was read-only), but most of the operations were available, i.e.: Gerrit index/query/plugin/version, git-upload-pack, replication

Waited for review-de to catch up with the latest changes that come in review-1 (we monitored it using the above “gerrit show-queue” command)

2/1/2019 – 15:54 GMT: Made disaster recovery active

Changed HAProxy configuration, and reloaded, to re-direct all the traffic to review-de, which become the active node in the cluster

See the transition of the traffic to review-de

Left review-de the whole night as the primary site. This way we also tested the disaster recovery site stability

2/1/2019 – 19:47 GMT: Upgrade review-1 and review-2 to Gerrit 2.16

Stopped Gerrit 2.15 and upgraded to Gerrit 2.16

Wait for offline reindexing of Projects, Accounts and Groups

Started with Gerrit 2.16 with online reindexing of the changes

It was possible to see an expected increase in the system load due to the reindexing, lasted for about 2 hours:

Furthermore, despite review-1 not being the active node, the HTTP workload grew disproportionately:

This was due to a well-known issue of the high-availability plugin, where the reindexing are forwarded to the passive nodes, creating an excessive workload on them.

3/1/2019 – 10:14 GMT: Made review 1 active

We used the same pattern used when upgrading review-de to align the data between masters

Changed HAProxy configuration, and reloaded, to re-direct back all the traffic to review-1

Conclusions

Migration was completed and production is back to stable again with the latest and greatest Gerrit v2.16.2 and the full PolyGerrit UI. With the migration of the Groups in NoteDB, ReviewDB leaves the stage completely to NoteDB. PostgreSQL is no more needed, simplifying the overall architecture.

The migration itself was quite smooth, the only issue was due to a plugin misconfiguration, nothing to have with Gerrit core. With the good monitoring we have in place, we managed to spot the issues straight away. Still, we will further automate our release process to avoid these issues from happening again.

Today GitHub has announced the extension of its free plan to include unlimited private repositories. This is great because allows a lot more people to start experimenting their side projects and keep them confidential until they ready to be shared publicly.

GerritHub.io allows extending this amazing offer by having a fully-featured code review process on top of their GitHub private repositories and still keep the confidentiality needed for early-stage projects. Differently from GitHub, however, GerritHub allows you to have an unlimited number of reviewers and collaborators, for free, forever.

A wonderful new 2019 is starting with two amazing free offers to allow everyone to experiment and unleash their potential:

Free unlimited repos from GitHub, limited to 3 collaborators

Free unlimited repos from GerritHub, with unlimited collaborators for reviews

That’s super-cool, how do I start?

Getting started with your private GitHub repositories on GerritHub is easy:

Want to use Gerrit into your Enterprise?

If you decide to use Gerrit Code Review in your Enterprise and you need the service level compliant with your company standards, you can get in touch with GerritForge which offers the full coverage of the Enterprise Support you will need:

Silver: 8×5 Support, with 24h turnaround for P1 issues

Gold: 24×7 Support, with 8h turnaround for P1 issues

Platinum: 24x7x365 Support, with 4h turnaround for P1 issues

What’s next?

With GitHub and GerritHub you have no excuses anymore to start innovating right now, with free unlimited repositories and free unlimited Gerrit reviewers and contributors.

Git protocol v2 was released as an experimental feature in Gerrit v2.16. However, the introduction came with a quite serious unnoticed bug: all refs are visible to all users,
regardless of the ACL configuration, giving to any registered users the complete access to
of all branches, tags and meta-data refs, their associated commit SHA1s and
the ability to fetch them locally … ouch!

What is the Security impact?

If you were using Gerrit v2.16 for OpenSource projects, not much: everything is visible to everyone anyway, so what’s the point?

For not-so-OpenSource projects, well, it cannot be used as-is in production for sure. It was flagged as experimental after all because it was intended more for an early adopter to “have a go with it” rather than using it at full scale in production for sensitive projects.

How did we get there?

The problem is mainly located in the JGit implementation of the refs filtering for the Git protocol v2. Gerrit ACLs are enforced using the JGit’s AdvertiseRefsHook which calls
RefFilter, where a Gerrit-specific implementation of it processes the access permissions associated with a user.

The AdvertiseRefsHook is usually set by UploadPack.setAdvertiseRefsHook
but, if Gerrit has the protocol v2 enabled in the gerrit.config and the client
is leveraging the git protocol v2 feature, the hook is not invoked. The bug has been already fixed in JGit but, of course, before enabling the support again in v2.16 we need to be sure that no other vulnerabilities are exposed.

What if I have Gerrit v2.16 in production?

On Gerrit v2.16 and v2.16.1, the Git protocol v2 was disabled anyway by default.

If you had it enabled in production by mistake (or bravery?), then just set it as disabled explicitly.

[receive]
enableProtocolV2 = false

If you can, just upgrade to v2.16.2 whenever possible, where the Git protocol v2 is always
disabled.

What’s next?

Git protocol v2 is coming back to Gerrit v2.16 very soon, possibly as early as next week or the one after, ready for Christmas 🙂

The fix is ready, but the Gerrit and JGit teams are working hard to put more specific security testing in place so that the new reintroduction can be safer to be rolled out.

Merry Christmas to everyone … and hope that Git Protocol v2 will be back with us very soon, just in time for the end of year celebrations.

The Gerrit User Summit 2018 has ended. It has been a truly memorable event and with strong emotions, gratitude and celebration of the success of the Gerrit OpenSource project.

During the hackathon, some major events happened:

The release of Gerrit v2.16

Support for Kubernetes

New plugins contributed by Qualcomm

The announcement of a new maintainer, Marco Miller from Ericsson, congrats!

Hackathon and Gerrit v2.16

The hackathon took place at SAP (2 days) and Cloudera (1 day) in Palo Alto with the main focus of completing the migration to PolyGerrit and cut the v2.16 release. Three major milestones of the Gerrit project:

With regards to the achievement #1, during the hackathon, the commit for removing all the GWT related code from Gerrit master has been finally merged.

PolyGerrit as default Gerrit Code Review UI

The PolyGerrit Team has reached the summit of the long and troubled journey of getting rid of the outdated and Gerrit GWT UI. Kasper and Logan (Google) burned every single bit of their development energy to fix, implement and fill all the remaining gaps in the user journeys.

On Gerrit v2.16, PolyGerrit UI is the default and standard way of interacting with Gerrit Code Review. GWT is still there for allowing people to smoothly adapt to the new interface. However, the next forthcoming version of Gerrit v3.0 will not contain anymore any reference to the old UI. People will still have a few more months to adapt and, from the feedback received on GerritHub.io, they seem to not miss the GWT interface at all.

Well done to the entire PolyGerrit Team, starting from Andrew who kicked off the project back in 2015 to Wyatt, Becky, Victor, Kasper and Logan who brought it up to the usability and portability to multiple devices as it is today. Thanks to their amazing effort, we can all interact with Gerrit Code Review wherever and whenever we want, without losing speed and effectiveness.

Migration of Gerrit groups to NoteDb

One more step towards the complete elimination of ReviewDb has been achieved in Gerrit v2.16: there isn’t anymore any mutable data in the leftover of the (in)famous Gerrit DB. The new version includes a new git repository (All-Groups) which contains all the information related to the groups create through the Gerrit UI.

What’s left in ReviewDb? Not much: only the schema version. When you install Gerrit it is populated with a single row and a single column with the value ‘170’ and will never change. There is no need to share it across master or slaves: just set database.type = h2 and forget about it.

Git protocol v2

It is a major achievement of Gerrit v2.16 as the Git wire protocol v2 solves the performance problems of large repositories: the refs filtering during the advertisement phase.

In a nutshell, if you were fetching a single from a repository with 500k refs, the server would have sent a huge payload of all refs with the associated SHA1s even if at the end of the day you wanted only to fetch a single one. The delay could have been of several seconds if not minutes, which is quite relevant if you are fetching all the times, like in a CI/CD pipeline.

Git protocol v2 reduced the refs advertisement phase by over 70%, which makes Gerrit v2.16 so appealing that you may want to place in the top 5 tasks of your backlog.

Farewell to GWT UI code

Even though Gerrit v2.16 keeps the ability to display the old-fashioned GWT Gerrit UI, the code has been definitely removed on the Gerrit master branch, which will the base for Gerrit v3.0 that will be released in Spring 2019.

This is a memorable moment because represents the final act of making PolyGerrit UI the unified user-experience of Gerrit moving forward.

Data and Insights of the Hackathon

This year the event has been truly focused as well on the analytics side of the Code Review and the entire CI/CD pipeline: was it possibly the location, Cloudera, the world leader in OpenSource BigData and Analytics, giving the right vibrations? Possibly, yes.

Numbers

Over 20 people have attended the Hackathon on-site and, for the first time, remotely via video-conference, from companies and countries all around the world:

Google (USA and Germany)

CollabNet (Japan and Germany)

GerritForge (UK, Ireland, Italy, and Germany)

SAP (Germany)

Qualcomm (USA)

Ericsson (Canada)

Wikimedia Foundation (UK)

Over 300+ changes have been uploaded and 244 of them have been already merged.

They have been working on 44 projects, showing how diverse is today the universe of Gerrit Code Review and its associated plugins and integrations.

Gerrit DevOps Analytics

GerritForge has been hacking on improving the platform that powers the Gerrit Code Review Analytics Platform which has been serving the community for many years.

For the first time, thanks to the introduction of branch-level analytics, the community had data automatically crunched in near-real time of what was happening on the master branch and on the on-going stable 2.16 release.

As a matter of fact, the numbers mentioned before are directly coming from Gerrit DevOps Analytics. Thanks again GerritForge for innovating and improving the Gerrit Code Review platform.

Gerrit Analytics Wizard

For the first time, two new contributors from GerritForge, Tony and Ponch, finalized and demoed a new plugin called ‘analytics-wizard’ that allows anyone with a Gerrit server to set up a mini-version of the Gerrit DevOps Analytics, using Docker, ElasticSearch, and Kibana.

Two new plugins: batch and tasks

Qualcomm has released two new plugins based on their experience of running complex pipelines on a multi-master Gerrit setup. The majority of the code was developed in-house before the hackathon. However, the final release and announced happened this week and I am sure it will allow many other companies to benefit from their experience of validating complex multi-repo changes on a large scale CI system.

Batch

This plugin provides a mechanism for building and previewing sets of proposed updates to multiple projects/refs that should be applied in a batch. These updates are built with Gerrit changes.

Task

The plugin provides a mechanism to manage tasks which need to be performed on changes along with a way to expose and query this information. Tasks are organized hierarchically, and task definitions use Gerrit queries to define which changes each task applies to, and how to define the status criteria for each task. An important use case of the task plugin is to have a common place for CI systems to define which changes they will operate on, and when they will do so.

Gerrit on Kubernetes from SAP and GerritForge

Luke (GerritForge), Matthias (SAP) and Thomas (SAP remotely from Germany) worked on a PoC for supporting Kubernetes deployments.

Last year SAP started using containerized slaves and decided to go cloud native and base our future Gerrit deployments on Kubernetes and leverage project Gardener to support multiple cloud providers. Matthias worked with Thomas to prepare a PoC and a demo the current work in progress and discuss plans moving forward.

The code has been published to the new k8s-gerrit repository and new commits will be subject to the standard Gerrit Code Review process.

Luke prepared an example of Gerrit v2.15 deployment in Kubernetes (single master at the moment) with shared storage and showed how to upgrade easily by leveraging Helm Charts deployments through Tiler.

Even though the two projects started in parallel, they agreed to cooperate and work together with the community to have a unified Kubernetes deployment code-base for installing and upgrading multi-master Gerrit setups. This is definitely an area where we will see very interesting developments very soon.

What about the Summit?

The talks of the summit have been very interesting this year and have covered mostly three main topics: Gerrit roadmap and scalability, sharing of real-life migrations to Gerrit v2.14 and v2.15 and, as previously mentioned, Data Analytics & Insights including research work on using machine-learning for supporting the Code Review process.

More blog posts will follow with more background on each talk, stay tuned and watch this space to know more about what has been presented and announced at the Summit.

Proposals for the next Gerrit User Summit 2019

The overall feedback on the summit has been very positive, including the Birthday Party organized and sponsored by GerritForge for the 10th anniversary of the Gerrit Code Review project.

During the final closing keynote, many interesting proposals have been made:

Gerrit users’ wish-list session.
Any Gerrit user can feed the backlog of new feature proposing what they will like to see implemented

Gerrit plugin hackathon.
Learn how to implement Gerrit plugin and trying to put together in one or two days something useful using any of the programming languages they wish, including Java, Groovy, Scala or anything else.

More details will come as soon as the closing keynote session recording will be published.

Remembering Shawn Pearce and his legacy

Dave Borowitz, the leader of the Gerrit Code Review project, made a thoughtful and touching speech about Shawn Pearce, the father of the Gerrit 2, who started the project exactly 10 years ago. The cake, the hackathon, the summit, and the v2.16 release have been fully dedicated to his memory as appreciation for his humanity and passion for OpenSource.

Thanks, Shawn. We will continue on your legacy and improve the Gerrit Code Review platform as you would have wanted us to do.