My Life as a Sys Admin

Category Archives: bazel

In my previous blog, i’ve shown bazel in action by building a solr cloud package. In this blog i’m going to explain a bit more about Bazel.

Bazel is the Open Source version of Google’s internal build tool Blaze. Bazel is currently in beta state, but it has been used by a number of companies in production. Bazel has some quite interesting features. Bazel has a good caching mechanism. It caches all input files, all external dependencies etc … Before running the actual build, bazel will first check the existing cache and if the cache is valid. If valid, then bazel will try to check if there are any changes to the input files/ dependencies. If it detect any changes, then bazel will start re-building the package. We can also use bazel to build our test targets and can make bazel to run our unit/integration tests for the built targets. Bazel can also detect cyclic dependencies with in the code. Another important feature is sandboxing. On Linux, Bazel can run build/test inside a sandboxed environment and can detect file leaks or broken dependencies. This is because, during sandbox mode, bazel will mount only the specified input files, data dependencies on to the sandbox environment.

Bazel Build Flow

Let’s see how the bazel build process flow works. First thing that we need is a WORKSPACE file. A bazel workspace is a directory that contains the source files for one or more software projects, as well as a WORKSPACE file and BUILD files that contain the instructions that Bazel uses to build the software. It also contains symbolic links to output directories in the Bazel home directory

Let’s create a simple workspace for testing

$ mkdir bazel-test && cd bazel-test
$ touch WORKSPACE

Now i’m going to build a simple python package. hello.py is a simple python script which imports a hello function from dep.py. So our primary script is hello.py which has a dependency on dep.py

The Bazel’s build command basically looks for a BUILD file on the target location. This file should contain the necessary bazel build rules. Bazel’s Python Rule Documentation explains the list of rules that are supported. Applying this to our test scripts, we are going to build a py_binary for our hello.py and this binary has a py_library dependency towards dep.py. So our final BUILD file will be,

woohoo, so bazel has build the package for us. Now if we check our workspace, we will see a bunch of bazel-* symlinks. These directories points to the bazel home directory where our final build output lies.

So our new python binary is available in bazel-bin/hello. Also, bazel creates something called runfiles which exists next to the binary. Bazel actually copies our dependencies (input files and data dependencies) onto this runfiles folder.

If we go through our python binary bazel-bin/hello, it’s nothing but a wrapper script which basically identifies our runfiles directory path, add this runfiles path to the PYTHONPATH env variable and then invokes our hello.py file. In the beginning, i’ve mentioned that bazel has a good caching mechanism. Let’s re-run the build command and see the output, especially the time taken to complete the build process.

Let’s compare the build time for both the build process. The first build process took ~ 4.5 sec. But the second one is ~ 0.2 sec. This is because, bazel didnt run real build process during the second run. It actually verified the input files against its cache and found no change.

Bingo, bazel is detecting the test failure too. During our build process we saw that bazel caches the build and doesnt re-run the build process unless it desont detect any changes to the dependencies. Now lets see if what bazel does with tests too.

Bingo, we can see the (cached) line in the output of the second tests run. So like the build process, bazel does caches the tests too.

Customizing Bazel Rules

py_binary, py_library etc… are the default bazel python rules which comes with bazel. Unlike any other product, we might endup in cases where we need to have custom rules to solve our specific needs. And the good news is, Bazel comes with an extension called skylark. With skylark, we can create custom build rules matching our requirements. Skylark syntax are pretty similar to python. I’ll be writing a more detailed blog on skyalrk soon 🙂

Conclusion

Though bazel is still in beta, it seems to be a really interesting tool for building hermetic packages. Bazel does has the ability to detect cylic dependencies and dependency leaks which is really an important thing. The caching ability of bazel really helps us to build faster packages compared to other traditional build tools.

It’s been a long time since i wrote my last blog. And this time i decided to write about something which ive been working for the last few months. Its Bazel. We have been using bazel heavily in production for the last couple of months and results seems to be pretty good. Leonid from our Infra team recently gave a talk about how we use bazel to build hermetic packages. I’ll be writing some detailed blog on how to play with bazel. But in this blog, we will be seeing bazel in action only.

This time i’m going to build a bazel package for solr cloud v6.3.0 and going to use this bazel package to spin up a solr cloud service.

Requirements

Solr uses Zookeeper as a repository for cluster configuration and coordination. There are tons of blogs on how to setup a simple ZK cluster, so i’m gonna skip that part. My test setup has a single node ZK cluster.

Setting up Bazel Solr Package

Bazel install page is well documented on how to setup bazel locally. If we go through the solr documentation, there are a bunch of variables that the bin/solr wrapper script looks for, especially when we want to customize our solr settings. On a local test setup, we dont care about such customization, but live environment, we definitely need to tweak things like JAVA_HOME or our SOLR_HOME directory where solr stores the data or even ZK hosts list.

My Bazel package is going to be pretty straight forward, it will have a shell binary which is basically a wrapper script. And this script will have two bazel data dependencies, 1) solr cloud source file and 2) solr config files. This wrapper binary makes sure that all the necessary runtime variables are set and the SOLR_HOME contains all necessary config files including various configs for the collections too. This wrapper binary will be using the solr source that is embedded with in the bazel’s runfiles folder (where bazel keeps all data dependencies for a specific build rule).

We need to teach bazel where to fetch the solr source file. So lets create a workspace and add tell bazel where to look for the solr source. Add the below lines to WORKSPACE file

Let’s create a folder for keeping our various solr config files like solr.xml, solrconfig.xml etc… Copy the necessary config files and expose them via a BUILD file. We can either use glob to expose everything blindly or we can simply create a list of files which we want to expose. If we create a list with specific file names, then bazel will expose only those files. In my case, i’m gonna use the glob similar to what i’m using in BUILD.extract file.

woohoo our solr cloud service is running with the newly created collection. And we now have a fully hermetic package with all the dependencies embedded with in. Bazel has a pretty good caching mechanism, so it will not rebuild the package everytime nor re-download the external dependencies when we run the same build command again and again. We can also use bazel to bundle our packages. Currently bazel can create tar/deb packages and even docker images. Bazel has got lot of interesting features which i’ll explain in detail in my upcoming posts on bazel 🙂