Topics

Featured in Development

Understandability is the concept that a system should be presented so that an engineer can easily comprehend it. The more understandable a system is, the easier it will be for engineers to change it in a predictable and safe manner. A system is understandable if it meets the following criteria: complete, concise, clear, and organized.

Featured in Architecture & Design

Sonali Sharma and Shriya Arora describe how Netflix solved a complex join of two high-volume event streams using Flink. They also talk about managing out of order events and processing late arriving data, exploring keyed state for maintaining large state, fault tolerance of a stateful application, strategies for failure recovery, data validation batch vs streaming, and more.

Featured in Culture & Methods

Tim Cochran presents research gathered from ThoughtWorks' varied clients and projects, and shows some of the metrics their teams have identified as guides to creating the platform and the culture for high performing teams.

Git’s wire protocol defines how Git clients and servers communicate with each other. The new Git protocol version improves performance by enabling server-side filtering of references, which include not only branches and tags but also, e.g., pull request heads and others. Clients adopting Git protocol version 2 will be able to specify which references they are interested in, thus reducing the amount of data the server sends back. Contrast this with Git's original protocol behaviour, where the server starts by sending back a list of all references in the repository, which could amount to many megabytes.

According to figures provided by Google engineer Brandon Williams at the time of the Git protocol version 2 announcement, the new protocol can be significantly faster than the old one, especially with large repositories such as Chrome's, which contains more than 500k references. Additionally, noted Williams,

Protocol v2 has also enabled a reduction of 8x of the overhead bytes (non-packfile) sent from googlesource.com servers. A majority of this improvement is due to filtering references advertised by the server to the refs the client has expressed interest in.

The reason for the delay in making Git protocol version 2 the default, notes Taylor Blau on GitHub blog, has been giving enough time for developers to catch any bugs in the protocol implementation. Interestingly, Git protocol version 2 is designed so any client implementing it can still talk to a Git server only supporting the old Git protocol.

As usual for Git releases, Git 2.26 includes a very long list of performance improvements, fixes, and new features. For example, git clone --recurse-submodules --single-branch now applies the --single-branch option to submodules as well.

Additionally, the git sparse-checkout command, introduced in Git 2.25, has a new add subcommand which can be used to add new directories to your sparse checkout one at a time. Sparse checkouts are mostly useful with large repositories, where you are not interested in the whole repo content but only on some of its subdirectories. In Git 2.25, to add a new directory to those already included in your sparse checkout, you were required to list all of them when using the set subcommand.

Worth a mention is also the improvement to the diff family of commands, which enables git add -p to better deal with whitespace problems.

As a final note, Git 2.26 makes it easier for you to know where one default option is defined, whether at repository, user, or system level, using git config --show-origin.