Category Archives: ElasticSearch

This week we have published the recording of David Pursehouse (CollabNet) talk about Gerrit v2.14 and v2.15. Even though the releases are not new, there are many improvements were made recently, especially regarding the support for Elasticsearch as an engine to store the Gerrit secondary indexes.

Introduction

Hello, I am David Pursehouse, Gerrit Maintainer and release manager.

So, I’m going to talk about 2.14. Actually, this is the same presentation as I already gave last year. So, some of you may have seen it already. Although, I have made … how do I just … I have made some changes.

Basically, 2.14 isn’t new anymore. it’s already 18 months old, but even last year when I presented this, it was already six months old. And it has been in constant development since then. Even until this month we’ve been adding new stuff or making new releases.

And I’ll also touch a bit on 2.15 because yesterday evening we realized that there isn’t a 2.15 talk in the schedule today or tomorrow. So, I will briefly cover 2.15.

What’s new in v2.14

A number of new features in 2.14. I won’t read them all out because I’ve got separate slides for them, but there are some new features as well, which I’ll talk a bit more about later, which were not in 2.14 last year.

Some notes on important things about upgrading to 2.14, and actually the same things apply if you are upgrading to 2.15, 2.16. We require Java 8, so I assume most of you are already on Java 8 at least, so that shouldn’t be an issue, but these things are things that people run into when they’re doing the upgrades on the mailing list and issue tracker. So, these are the things that people have stumbled on that you should know about.

There’s no more HTTP digest authentication. This was basically because of moving to the accounts to NoteDb and not being able to store them in NoteDb. If you upgrade, it will get migrated. I think there were some issues with this initially, but they’ve been fixed in the bug fix releases. And the main thing for users here is that you can’t see your password in the UI anymore. It doesn’t display it. If you want a new password, you have to reset it, and then it disappears.

Also, there’s a new secondary index for groups, and you need to reindex that before you start the server. Otherwise your groups don’t work, and basically, Gerrit doesn’t work. And there are more details about this in the release notes.

Speaking of which, probably you have already seen this already because we’ve been sending out emails with the links, but we’ve moved the release notes to the new homepage, which is on www.gerritcodereview.com. When I wrote this last year, it was the case that the change would go live immediately after we submit them, but we’ve changed the hosting, and that isn’t the case anymore. Luca was going to fix that, but I don’t think he has. No. That will get fixed soon I hope.

New features in v2.14.

We move onto the actual new features. These are not in any particular order. It’s just a brain dump of the new stuff. Changes can now be assigned to people, and you can search for them with the new operators and make it highlighted. And that’s basically it. There’s no workflow around this. Gerrit doesn’t enforce any workflow for assigned changes. You can use it as you like, although I think there’s been some discussion about changing that to make it a more defined workflow so that people are more aware of what assignable changes are. There’s confusion among people about what does it mean when a change gets assigned to somebody. And right now it means nothing. There’s no enforcement of that in Gerrit.

Changes can now be deleted if they’re open or abandoned. This is kind of similar to previously with draft changes. You could delete a draft change, but now you can delete any change that’s basically not merged. And that can be done by the administrator or the owner of the change, assuming they’ve been given the capability. And this basically deletes the change completely. It deletes the ref. It deletes the metadata. And you can’t delete merged changes for obvious reasons.

The next one is reviewer suggestions. If you click on the box to add a new review, it will suggest based on previous users who have already reviewed your changes. And it can also be filtered. Also, there’s an extension point. Plugins can inject suggestions, although I’m not sure if any plugins actually implement that, but it’s there.

New email templates. Now we’re using a different framework for templating. It supports HTML, and the previous templating engine is going to be removed, actually is removed in 2.16. Still present in 2.14, 2.15, but deprecated.

Inbound email. This is a feature that was implemented by Patrick at Google. If you get a notification of a review from Gerrit, you can reply to that, and it will add the comments on the change. And if you click this link, you get more details about that.

Tagged comments. This is a way that enables you to filter the comments in the UI according to whether they’re by humans or robots, in this case meaning CI. So, for example, on the example here, you can see that all the CI comments are not shown on the right side. I think this only works in the new UI. I’m not … Is that right? I don’t know. Maybe it works in both UIs. I’m not sure.

So, in the new UI in 2.14 there’s been a lot of improvements compared to previously, and it’s basically usable for most of the common things that you want to do like reviewing changes, but there’s still a lot missing. A lot of the admin pages are missing. And you can switch between the two. So, if anything’s missing in PolyGerrit, you go to GWT and do it there. And if you look at this talk that’s linked here, it will give you an overview of the development of PolyGerrit, the idea behind it and so on.

The merge list is a dynamically generated list of commits that are going to be included by a merge commit, and it appears in the UI as a file that you can review, which is really useful if the commit author didn’t use a log option on the merge, so you can see what really is coming in by that merge and you can maybe catch if someone’s done it against the wrong parent or something. This isn’t actually a real file. It doesn’t get included in your commit. It’s just generated on the fly by the server.

Support for Elasticsearch as a secondary index

This is new since the last time I did this talk. And I want to thank Marco and David, who are the guys that really made this work. A lot of work went in by these guys to get Elasticsearch working. This is included since 2.14.8 with minimal support for Version 2 of Elasticsearch. And then there are a few maintenance releases which add support for five and six. If you look at this link here, you’ll see the more detailed list of compatibility between Gerrit and Elasticsearch. There actually was this week a new version of Elasticsearch, which will not be supported in 2.14 as far as planned. I’ve got a change up for review that will support that in 2.15.8, whatever the next one is. And I know Elasticsearch 7 is coming soon, but we haven’t started working on that yet. If we do, it’ll be in 2.15.whatever.

This is another new one since the last presentation. This object size limit is an option in Gerrit that’s been there for a long time. It allows you to prevent people uploading big files, but the problem with that was it was kind of weird to configure it. You couldn’t configure it on a project and have it inherited to the children projects. It was either global or per project. So, now you can configure this and have it inherited, but we made it so it’s disabled by default, so it doesn’t mess up anyone’s existing configurations.

And there was one minor update of LFS in 2.14.1. Actually, we added LFS in 2.13, but then we added support for the file locking. And the support for the Amazon S3 and File-system isn’t new, but previously there were separate plugins, and now we’ve rolled them together into the one single LFS plugin.

Overview of new features in v2.15

And now finally just a brief overview of what’s in 2.15. I’ve taken this from Dave’s talk from last year, but he had a lot of details about NoteDb, which I’m not really qualified to talk about, so I’ll kind of gloss over it a bit. There is a lot more PolyGerrit UI functionality, but it’s still not equal with GWT. You need to wait for 2.16 to get that. The support for NoteDb allows having some things that were not available before. The separation of reviewer and CC in the changes with a history of adding reviewers. Hashtags, actually, were introduced a long time ago, I think in 2.13, but required NoteDb, so it couldn’t really be used. The robot comments, which is basically CI can inject comments into the changes and suggest changes. You can also ignore a change, which means it will just not appear in your dashboard. You can mark it as reviewed, so that it will not be highlighted. There is also a really improved experience in diff between rebased patch sets, and again, in Dave’s presentation from last year, there is a lot more detail about that. So, if you want to know more, you can look at that.

We have a new change workflow. The draft changes are removed and replaced by private and work in progress, and David will talk about that this morning. And there’s a lot of other stuff that I kind of missed, but if you want to see that, you can look at Dave’s talk from last year, which is linked here.

Q&A

Q: We recently migrated to 2.14, and we often see the index and losing the indexing being really slow. How do we diagnose those issues, the indexing issues? I don’t see very good documentation on this.

A: If you have specific problems or issues and you have data to support that, you can submit a bug report or you can submit any questions to the mailing list. I don’t really know if there’s any way you can tweak that to make it perform better, but if you have specific questions, we can take them on the mailing list or in the issue tracker.

Q: Gerrit indexing works very badly when we have close to 2000 open changes. Is there a specific configuration to improve the performance? We are using v2.14.10.

A: I don’t know if there are any improvements from 10 up to the latest patch release. One thing about this 2.14 is we have made a much larger number of maintenance releases than previous releases, mainly because of the Elasticsearch support, but also there have been some pretty serious things fixed in these releases, so you probably should check the release notes and see if anything related to your issue is mentioned after 2.14.10.

Q: You mentioned Java 8. I was wondering if any later Java would also work, or does it specifically have to be Java 8? And generally, what’s the thinking about newer versions of Java support?

A: Yeah. There is work ongoing to support later versions of Java, and David Ostrovsky will tell you about that in his presentation.

Accelerating your time to market while delivering high-quality products is vital for any company of any size. This fast pacing and always evolving world relies on getting quicker and better in the production pipeline of the products. The whole DevOps and Lean methodologies help to achieve the speed and quality needed by continuously improving the process in a so-called feedback loop. The faster the cycle, the quicker is the ability to achieve the competitive advantage to outperform and beat the competition.

It is fundamental to have a scientific approach and put metrics in place to measure and monitor the progress of the different actors in the whole software lifecycle and delivery pipeline.

Gerrit DevOps Analytics (GDA) to the rescue

We need data to build metrics to design our continuous improvement lifecycle around it. We need to juice information from all the components we use, directly or indirectly, on a daily basis:

SCM/VCS (Source and Configuration Management, Version Control System)how many commits are going through the pipeline?

Code Review
what’s the lead time for a piece of code to get validated?
How are people interacting and cooperating around the code?

Issue tracker (e.g. Jira)
how long does it take the end-to-end lifecycle outside the development, from idea to production?

Getting logs from these sources and understanding what they are telling us is fundamental to anticipate delays in deliveries, evaluate the risk of a product release and make changes in the organization to accelerate the teams’ productivity. That is not an easy task.

Gerrit DevOps Analytics (aka GDA) is an OpenSource solution for collecting data, aggregating them based on different dimensions and expose meaningful metrics in a timely fashion.

GDA is part of the Gerrit Code Review ecosystem and has been presented during the last Gerrit User Summit 2018 at Cloudera HQ in Palo Alto. However, GDA is not limited to Gerrit and is aiming at integrating and processing any information coming from other version control and code-review systems, including GitLab, GitHub and BitBucket.

Case study: GDA applied to the Gerrit Code Review project

One of the golden rules of Lean and DevOps is continuous improvement: “eating your dog food” is the perfect way to measure the progress of the solution by using its outcome in our daily life of developing GDA.

As part of the Gerrit project, I have been working with GerritForge to create Open Source tools to develop the GDA dashboards. These are based on events coming from Gerrit and Git, but we also extract data coming from the CI system, the Issue tracker. These tools include the ETL, for the data extraction and the presentation of the data.

As you will see in the examples Gerrit is not just the code review tool itself, but also its plugins ecosystem, hence you might want to include them as well into any collection and processing of analytics data.

Wanna try GDA? You are just one click away.

We made the GDA more accessible to everybody, so more people can play with it and understand its potentials. We create the Gerrit Analytics Wizardplugin so you can have some insights in your data with just one click.

What you can do

With the Gerrit Analytics Wizard you can get started quickly and with only one click you can get:

Initial setup with an Analytics playground with some defaults charts

Populate the Dashboard with data coming from one or more projects of your choice

The full GDA experience

When using the full GDA experience, you have the full control of your data:

Schedule recurring data imports. It is just meant to run a one-off import of the data

Create a production ready environment. It is meant to build a playground to explore the potentials of GDA

One click to crunch loads of data

Once you have Gerrit and the GDA Analytics and Wizard plugins installed, chose the top menu item Analytics Wizard > Configure Dashboard.

You land on the Analytics Wizard and can configure the following parameters:

Dashboard name (mandatory): name of the dashboard to create

Projects prefix (optional): prefix of the projects to import, i.e.: “gerrit” will match all the projects that are starting with the prefix “gerrit”. NOTE: The prefix does not support wildcards or regular expressions.

Date time-frame (optional): date and time interval of the data to import. If not specified the whole history will be imported without restrictions of date or time.

Username/Password (optional): credentials for Gerrit API, if basic auth is needed to access the project’s data.

Sample dashboard analytics wizard page:

Once you are done with the configuration, press the “Create Dashboard” button and wait for the Dashboard, tailored to your data, to be created (beware this operation will take a while since it requires to download several Docker images and run an ETL job to collect and aggregate the data).

At the end of the data crunching you will be presented with a Dashboard with some initial Analytics graphs like the one below:

You can now navigate among the different charts from different dimensions, through time, projects, people and Teams, uncovering the potentials of your data thanks to GDA!

What has just happened behind the scenes?

When you press the “Create Dashboard” button, loads of magic happens behind the scenes. Several Docker images will be downloaded to run an ElasticSearch and Kibana instance locally, to set up the Dashboard and run the ETL job to import the data. Here a sequence workflow to illustrate the chain of events is happening:

Conclusion

Getting insights into your data is so important and has never been so simple. GDA is an OpenSource and SaaS (Software as a Service) solution designed, implemented and operated by GerritForge. GDA allows setting up the extraction flows and gives you the “out-of-the-box” solution for accelerating your company’s business right now.

Contact usif you need any help with setting upa Data Analytics pipeline or if you have any feedback about Gerrit DevOps Analytics.

A brand new version of Gerrit is out, but the increment of the minor version number to 14 uncovers a set of unique innovations that this release provides.

Gerrit Ver. 2.14 is most likely the last 2.x version before the introduction of Gerrit 3.0, which would change forever the way we look and interact with code-reviews. That means that even though 3.0 isn’t ready yet, some experimental features have already been introduced in Gerrit 2.14. Those will be tagged with the [exp] prefix in this article, but don’t be scared by the wording: all Gerrit features, including the experimental ones, are heavily used on a daily basis by large installations like Google’s and GerritHub.io

Java 8

For the first time, Java 8 is a mandatory requirement to run Gerrit. It was previously a strongly recommended option, but both Java 7 and 8 were equally supported. The switch to Java 8 comes with the incompatibilities with all the operating systems that do not support its latest version and updates, such as Ubuntu 15.x or CentOS 5.x to name some of them.

PolyGerrit and review by e-mail [exp]

Gerrit includes a richer user experience with two major improvements: new redesigned HTML5 with WebComponents UX (code-named PolyGerrit) and a fully featured bi-directional HTML e-mails. Interacting with Gerrit is becoming easier and more intuitive.

With PolyGerrit the changes diffs are included into the main screen and are as simple as expanding a div section. The page loading is much faster thanks to the browser caching to the core building blocks of the UX. Even though the UX isn’t complete yet, a lot of Google’s teams use it already on a daily basis, including the Chromium and Go-Lang projects.

The redesigned and richer HTML emails are now bidirectional and include all the information you need to perform an off-line review using your e-mail client. If you are on the move, just reply to the e-mail with your comments and Gerrit will pick them up and include in the change review as messages, amazing isn’t it?

ElasticSearch [exp]

It is now possible to use an alternative Indexing engine, ElasticSearch, which allows having a clustered setup of distributed nodes of index data. That is a major stepping stone towards the full implementation of Gerrit multi-master, giving the possibility of multiple Gerrit masters to share the index data with replication over the network.

Out of the box UX and Plugin Manager [exp]

Installing Gerrit with the associated plugins is so much easier: there is no need to clone the code or googling around for a compatible plugin build: everything is included in Gerrit with an intuitive and user-friendly user-experience. Just use the search box to find the plugins compatible with Gerrit v2.14 and install them with a single click.

This new feature is provided by the new native packages (RPMs, Debian and Docker) which benefit from two new plugins (out-of-the-box and plugin-manager) that are included by default and executed as the first action after a new fresh installation.

What’s next?

A lot more is coming, as the NoteDb support become more mature every day. Google has announced to have switched off the ReviewDb in production and is using NoteDb as “unique source of truth” for all its projects. Gerrit 3.0 with 100% NoteDb support is coming very soon and will change the way you think and interoperate with your code review forever.