Julien Couvreur's programming blog and more

Radiolab spun off a podcast series on Supreme Court cases. The most recent one was on racial discrimination in jury selection. It’s excellent, have a listen.

When it highlighted some relevant statistics about blacks being struck off juries by prosecutors, I thought “yep, in the aggregate, this is clear evidence of bigotry and prejudice”.
But upon reflection, I’m less sure.

Jury selection requires attorneys to be selective and discriminate: they want to choose the juries that will increase their chances of winning.
They will use many factors to evaluate good picks: answers provided, background, behavior, and more.
If an attorney is bigoted, he will prioritize his prejudice for skin color (or some other irrational factor) ahead of increasing his chances of winning. He will waste precious strikes that he could have used better. His selection will be worse.
So, assuming that excluding black juries changes the perspectives represented on the jury and affects the outcomes trials, then being bigoted must hurt his win-rate.
This would hurt his career relative to more rational attorneys or firms.

Given that trials are highly rivalrous and competitive and that jury selection is an important part of the trial, it is hard for me to assume that attorneys are so incompetent and are granting the opposing party such an easy advantage. This makes the bigotry thesis less likely in my opinion.
The alternative thesis is that the observed outcome is driven by rational decisions. The podcast explained a few possible such reasons.

Taking a step back, what kind of evidence would let us evaluate and distinguish those two theses?

Although those may not be conclusive data points either (given how hard it is to tell bigotry apart from rational choice), here are some I’d be curious about:

Do prosecutors tend to reject black juries more commonly than defense attorneys do?

Assuming they are less bigoted (which is not obvious), do black attorneys tend to reject black juries less, on average?

Do attorneys that keep blacks on their juries win their cases more, on average?

In an experiment, if you train some attorneys with this knowledge, do their win-rate improve?

Aside from sharing those thoughts, I want to mention some related problems which the episode illustrates: how to define and prove instances of discrimination (beyond aggregate and general evidence), and how people adjust their behavior to specific anti-discrimination rules (it’s not clear that you can control/reduce bigotry, even more so in a monopoly service which “customers” can’t avoid).

I watched some relatively recent movies about obedience to authority and the corruption to power. All three are quite chilling if not outright disturbing (not recommended for children). They show how far people can go (and how easily) when guided by “authority” or granted authority themselves.

The most obvious question is what factors (if any) shield individuals from such influence. But we know little about that, as ethical considerations have limited the pursuit of such studies.

How will you respond to such knowledge? Such studies and others show we are mistaken to think ourselves and people around us immune, even after learning of those results.
Rigorous statistical studies have little effect on the worldviews of people who learn about them, whereas people tend to integrate anecdotes better (as Veritasium’s Derek Muller recently discussed in Why Anecdotes Trump Data). Hopefully, seeing those experiments come to life as movies will be impactful in that way.

Experimenter (2015)

Experimenter depicts the famous Stanley Milgram experiments: unknowing participants are set up in a fake teacher-learner experiment where they are asked to shock the learner (a conferedate with a recorded performance who fails to learn on purpose) with increasing voltage. The question is whether the participants in their roles as “teachers/zappers” will go all the way to the apparently fatal shocks.
It is probably the better established result of the experiments I’ll cover, due to its robustness (multiple variants producing similar results) and reproducibility (although very few have been attempted due to ethical concerns of the possible psychological effects on participants). Milgram was trying to understand how the attrocities of nazi Germany could happen.
Read more in Milgram’s Obedience to Authority.

The Stanford Prison Experiment (2015)

The Standord Prison Experiment tells the story of Philip Zimbardo’s experiment at Stanford. The synopsis: “Twenty-four male students out of seventy-five were selected to take on randomly assigned roles of prisoners and guards in a mock prison situated in the basement of the Stanford psychology building”.
The original experiment was interrupted early, as things turned bad very fast. I don’t know that this experiment was repeated. The movie was very disturbing.
Find out more in Zimbardo’s The Lucifer Effect: Understanding How Good People Turn Evil.

Compliance (2012)

I won’t go into much details to spoilers, but Compliance describes a scam that was perpertrated on multiple fast food joints over 70 times.
In short: “When a prank caller claiming to be a police officer convinces a fast food restaurant manager to interrogate an innocent young employee, no one is left unharmed”.
Obviously, this is the most sketchy and unethical, as far as scientific rigor, but it was conducted in real-world conditions as opposed to a lab with volunteers.

My previous article on Git Internals described the object model for a single repository. But how do distributed repositories work together?
As I’ll try to explain, immutability is the foremost key.

DAG of commits

The core design of Git revolves around building a graph of commits where each commit points to its parent(s) commit(s) and to a tree of objects (representing files and folders). Commits and tree objects are immutable; they can be added, but never modified.
This immutability (and the fact that all those objects have globally unique content-based identifiers) make it safe for people to party on this graph across the world.
Each contributor is just adding new commits to the graph, each with a new object tree. The new commits can reference existing commits and the new object trees can reference existing tree objects. All those new objects could then be safely shared to others without conflicts. At the same time, no single Git instance has the complete view of the graph that is getting built.

References

Not everything in Git is immutable though. Branch references, which are also simply called branches, are updateable references to commits.
The key to avoiding distributed conflicts is clear ownership: a repository can only modify branches it owns, and receive updates for other branches from their owners.
Branch names are namespaced, so you can tell which ones each remote repository owns and which ones your local instance owns. If your repository is linked to “remote1” and “remote2”, their branches will be named “remote1/blah” and “remote2/foo”, while your local branches will simply be named “bar”.

Fetch, merge, rebase, push and pull

We’ll now look at some operations and how they affect the commit graph and the branch references.

Fetch get updates from a remote repository. You will get updated branch references and all the objects necessary to complete their history.
This does not update your own repository’s branches and therefore is conflict-free.

On the other hand, merge and rebase will update one of your repository’s own branches. Both merge and rebase are designed to handle divergence between two* branches. Those could be two* local branches, but I’ll explain the case where your local branch added commits and its corresponding remote branch added other commits.

Merge will create a new commit with two* parents: the commit referenced by the remote branch and the one referenced by your local branch. It is generated by considering all changes since their common commit ancestor, and may require manual intervention to resolve conflicts. Your local branch is then updated to reference this commit after it is created.
The degenerate case where the your branch had no changes is simpler. Your local branch was the common ancestor and will be updated to match the remote branch, without need to make a new commit. It is called fast-forward.

A pull operation simply combines a fetch and a merge.

Rebase will create a chain of new commits which descend from the commit referenced by the remote branch and then update your branch to reference the last commit in that new chain.
Those new commits replay the changes you had in your local branch (since the common ancestor commit). The chain that is generated could be interactively tweaked during rebase, for instance to combine or split the original commits in some way.

Both merge and rebase will only update one of the branches (the working branch) and leave the other(s) unchanged.

Push sends some new commits of yours to the remote repository and ask it to update one of its own branch references. The normal case (no forcing) is restricted to a fast-forward.

Example

Let’s look at an example of divergence, merging and rebasing, using illustrations borrowed from Pro Git.
The first figure shows two local branches (master and experiment) that diverged by adding one commit each (C3 and C4).

Merging is one way to handle this divergence. It adds a new merge commit (C5) which has two parents and updates one of the branch references (master in this instance).

Another way to handle this same situation is to use rebase. Instead of creating a merge commit with two parents, it adds a new chain of commits to one side. Those new commits (C4') replay the changes on the other side of the divergence (C4) since the common ancestor (C2). Then it updates the other branch reference (experiment).
Some commits may be left hanging with no reference, such as C4 here.

After this rebase, if we try to update the master branch with a merge of the experiment branch, this will be a fast-forward merge. It simply updates the master reference and does not require creating any new commit.

This example used two local branch names, but the operations work exactly the same with one remote branch, which is read-only to you, and one local branch, which will be updated.

Summary

To recap, there are a few keys that illuminate Git’s design:

Commits and object trees are immutable.

Commits and objects have globally unique identifiers.

Branches are mutable references to commits, but are namespaced by repository and have clear ownership rules.

Although a couple of people have identified immutability in particular to be a key in Git’s design (Scott Chacon in his excellent Getting Git talk or Philip Nilsson), I’m surprised that this is not more commonly emphasized. With those keys, its design becomes much easier to understand in its simplicity and elegance.