Grok LOC?

This is the seventh post in my series on Grokking xv6. In this post we
will look at how I tested one of my original hypothesis starting out:
understanding every single line of code.

The last two months I have been going through xv6. Here’s a quote on
one of my hypotheses from my
initial post:

With dedicated study, I’ll be able to grok xv6 in about a month.

To make the concept of grokking more concrete:

understand every single line in the source code

teach it or parts of it to someone else

rewrite or extend parts of the source code

In this post I want to talk specifically about understanding every
single line of code. Before we go into that there’s two observations I
want to make:

Time frame: I originally set out to go through xv6 in about a
month. In reality it took close to two months, and not counting
vacation around six weeks. This is still “about” a month but was
a bit longer than expected and desired.

Every single line of code: I realized after a while that this
was a bit over-ambitious so I mentally adjusted the bar to
understanding 95% of the code.

Methodology

With that said, how do we test a hypothesis like this? And what does
it mean to understand a line of code? I decided to use a sampling
method and classify lines of code into ones I either understood or
didn’t. This test was somewhat subjective - essentially I asked myself
if I understood what a line’s purpose was, and if I understood all the
components of the line. For example, it was fine to check how related
functions were defined and to look up basic documentation, but
anything that required googling was a ‘No’. The idea is that if I
wrote a piece of code myself from scratch this is about what I would
expect to understand.

Since the xv6 code booklet was numbered with lines of code from about
1 to 10000, I simply generated a bunch of random numbers from 1 to
10000:

(repeatedly 50 #(rand-int 10000)) ; Clojurescript

This ensured uniform sampling. Next I removed all the lines that
matched one of the following: empty lines, comments, #include
statements, and lines of code with very few characters in them, such
as { or }. The goal of this was to get real lines of code only,
and not ones that have a very generic explanation.

I repeated the above process until I had 20 lines of real code. 20 was
chosen as being a good balance between getting enough of a sample for
my purposes, and not taking too long to go through.

Since I would only be checking a sample as opposed to the whole
population (i.e. all ~ 10 000 lines of code), I would only get an
estimate for my understanding of the code.

I decided that I could consider my hypothesis false if the observed
value was more than two standard deviations away (outside a 95%
confidence interval) from the expected value. Assuming the
hypothesis that I would understand 95% of the code and that I would
look at 20 lines of code, I used a little utility I wrote,
rrange, to see what the range
would be.

~$ rrange 0.95 20 # unix util
Around 19 ~ [17, 20]

This means that if I have a 95% understanding of the whole code base
and I look at 20 random lines of code, I should expect to understand
17 to 20 of them.

When making statistical claims such as this, one has to be careful
about not confusing the sample and population. For example, if I have
a real understanding of 70% of the code, it wouldn’t be that unlikely
to get 18 out of 20 ‘Yes’s’.

~$ rrange 0.7 20
Around 14 ~ [10, 18]

Another potential source of error is that I performed the test on
myself, and I have a vested interest in getting a good outcome. A more
objective test would be desirable.

Results

Here’s a list of the 20 samples, together with a note on its context
and a brief comment on my understanding of the line and its
purpose. Sometimes, in the case of a return statement, the test is
whether I understand why that thing is being returned.

Conclusion and further work

My understanding of the sample code was within two standard deviations
of the estimated value, so I failed to reject the hypothesis that I
understand 95% of the xv6 source code. Furthermore, it seems unlikely
that I understand less than 70% of the source code.

Initially I was skeptical about my ability to test this hypothesis,
but I’m pretty happy with the method used in this article. A few weeks
ago I did a trial run, and I found that I got more ‘No’s’ on samples
related to the filesystem, which I had yet to study by then. A similar
number of samples related to the file system were present this time
around, and I got more ‘Yes’s’ on those samples, which seems to
reflect my deepened understanding of that part of the code base. This
suggests that the test for understanding that I’m using is not
completely unreasonable.

The test still leaves a lot to desire though, primarily because of two
reasons: (a) it lacks objectivity, and (b) it doesn’t touch on the
essence of programming. The essence of programming is to program, as
opposed to reading other people’s programs. A different direction that
I think would be interesting to pursue is to re-create an OS or part
of it from scratch. However, I think this test, along with the related
homework assignments and the other posts in this series, are good
enough for my present purposes.

We are almost coming to an end to the series. The next step will be to
use my knowledge and do something new with it.