Wednesday, April 26, 2017

Who Contributes to PostgreSQL Development?

In a talk which I gave at PGCONF.IN and, in a shorter version, at PGCONF.US, I had a few slides on who contributes to PostgreSQL development. Here, I'd like to present a slightly expanded version of the information which was in the talk. The information in this post considers calendar year 2016 and comes from two sources.
First, I went through the PostgreSQL commit log for 2016, manually tagged each commit by principal author, and recorded the number of new lines of code added by that commit based on git diff --stat -w -M, options which are intended to suppress (more or less successfully) whitespace-only changes and changes due to file renames. I also manually eliminated a few large mechanical commits, principally translation updates. Second, Thom Brown extracted the authors of every email sent to the pgsql-hackers mailing list during 2016, and I then cleaned that up and normalized the names in an attempt to make sure that all emails sent by the same person were counted under one name. Note that, because this data is all for calendar year 2016, it includes the end of the PostgreSQL 9.6 development cycle and the beginning of the PostgreSQL 10 development cycle.

I feel that this data, taken together, presents a reasonable view of who is contributing to PostgreSQL development. From the commit log data, we can see who is writing code, and also who is committing that code when it gets written. From the email counts, we can see who is participating in mailing list discussions, which captures - at least to some degree - the work of reviewing patches, providing feedback on designs, reporting problems, etc. Neither measure is perfect; notably, anyone who was frequently the second author on a patch might be under-represented in these numbers, and two people could have written the same number of emails yet one of them might have written much more detailed, thoughtful, and useful emails. Nonetheless, I believe these numbers do a fairly good job of capturing who did the work of moving PostgreSQL development forward during calendar year 2016.

Note that this considers only core development. Many other people contribute by contributing to projects such as pgAdmin, pgpool, pgbouncer, and various PostgreSQL connectors; others contribute to user education, advocacy, web site maintenance, and other efforts. I think it would be useful to see statistics on those types of contributions as well, but I leave it to people more familiar with those areas to judge how such contributions would be best measured.

Disclaimers aside, and before we get into the details, here are some quick overall statistics:

In 2016, 141 people contributed at least 1 new line of code to PostgreSQL. 37 of those people account for 90% of the new lines of code contributed to PostgreSQL in 2016, and 14 of them account for 66% of the new lines of code contributed to PostgreSQL during 2016.

In 2016, 18 committers committed at least one patch for which they were not the principal author. 90% of the lines of code for which the principal committer was not the author were committed by 6 committers, and 66% of the lines of code for which the principal committer was not the author were committed by 2 committers.

In 2016, 528 people (modulo duplicate email addresses that I couldn't identify as belonging to the same person) sent at least 1 email to pgsql-hackers. 90% of those emails were sent by 78 people, and 66% of those emails were sent by 23 people.

Now, here are the detailed charts. First, here are the 37 people who were the principal authors of 90% of lines of new code contributed during 2016. Non-committers are marked with an asterisk. "lines" shows the number of lines of code for which that person was the principal author, "pct_lines" shows that as a percentage of the total lines contributed, and "commits" is the number of commits across which those lines were spread.

I started doing this for myself but gave up after tagging 68 of 143 authors. Some people are hard to figure out. I do not think the comments here support preformatted text so you can see the result in my Reddit comment.