jtolio.comhttps://www.jtolio.com/
Recent content on jtolio.comHugo -- gohugo.ioen-ushello@jtolio.com (JT Olio)hello@jtolio.com (JT Olio)Sun, 06 May 2018 14:01:09 -0600Multinomial Logistic Classificationhttps://www.jtolio.com/2018/05/multinomial-logistic-classification
Sun, 06 May 2018 14:01:09 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2018/05/multinomial-logistic-classification
<p><em>This article was originally a problem I wrote for a coding competition I
hosted, Vivint&rsquo;s 2017 Game of Codes (now offline). The goal of this problem was
not only to be a fun challenge but also to teach contestants almost everything
they needed to know to build a neural network from scratch. I thought it
might be neat to revive on my site! If machine learning is still scary
sounding and foreign to you, you should feel much more at ease after working
through this problem. I left out the details of
<a href="https://en.wikipedia.org/wiki/Backpropagation">back-propagation</a>,
and a single-layer neural network isn&rsquo;t really a neural network, but in this
problem you can learn how to train and run a complete model! There&rsquo;s
lots of maybe scary-looking math but honestly if you can
<a href="https://en.wikipedia.org/wiki/Matrix_multiplication">multiply
matrices</a> you should be fine.</em></p>
<p>In this problem, you&rsquo;re going to build and train a machine learning model&hellip;
from scratch! Don&rsquo;t be intimidated - it will be much easier than it sounds!</p>
<h3 id="what-is-machine-learning:0f4d8484fd7f999da0a9c6d7316f575e">What is machine learning?</h3>
<p><em>Machine learning</em> is a broad and growing range of topics, but essentially
the idea is to teach the computer how to find patterns in large amounts of
data, then use those patterns to make predictions. Surprisingly, the techniques
that have been developed allow computers to translate languages, drive cars,
recognize cats, synthesize voice, understand your music tastes, cure diseases,
and even adjust your thermostat!</p>
<p>You might be surprised to learn that since about 2010, the entire artificial
intelligence and machine learning community has reorganized around a
surprisingly small and common toolbox for all of these problems. So, let&rsquo;s dive
in to this toolbox!</p>
<h3 id="classification:0f4d8484fd7f999da0a9c6d7316f575e">Classification</h3>
<p>One of the most fundamental ways of solving problems in machine learning is by
recasting problems as <em>classification</em> problems. In other words, if you can
describe a problem as data that needs labels, you can use machine learning!</p>
<p>Machine learning will go through a phase of <em>training</em>, where data and
existing labels are provided to the system. As a motivating example, imagine
you have a large collection of photos that either contain hot dogs or don&rsquo;t.
Some of your photos have already been labeled if they contain a hot dog or not,
but the other photos we want to build a system that will automatically label
them &ldquo;hotdog&rdquo; or &ldquo;nothotdog.&rdquo; During training, we attempt to build a model of
what exactly the essence of each label is. In this case, we will run all of our
existing labeled photos through the system so it can learn what makes a hot dog
a hot dog.</p>
<p>After training, we run the unseen photos through the model and use the model to
generate classifications. If you provide a new photo to your hotdog/nothotdog
model, your model should be able to tell you if the photo contains a hot dog,
assuming your model had a good training data set and was able to capture the
core concept of what a hot dog is.</p>
<p>Many different types of problems can be described as classification problems.
As an example, perhaps you want to predict which word comes next in a sequence.
Given four input words, a classifier can label those four words as &ldquo;likely the
fourth word follows the last three words&rdquo; or &ldquo;not likely.&rdquo; Alternatively, the
classification label for three words could be the most likely word to follow
those three.</p>
<h3 id="how-i-learned-to-stop-worrying-and-love-multinomial-logistic-classification:0f4d8484fd7f999da0a9c6d7316f575e">How I learned to stop worrying and love multinomial logistic classification</h3>
<p>Okay, let&rsquo;s do the simplest thing we can think of to take input data and
classify it.</p>
<p>Let&rsquo;s imagine our data that we want to classify is a big list of values. If
what we have is a 16 by 16 pixel picture, we&rsquo;re going to just put all the
pixels in one big row so we have 256 pixel values in a row. So we&rsquo;ll say
\(\mathbf{x}\) is a vector in 256 dimensions, and each dimension is the
pixel value.</p>
<p>We have two labels, &ldquo;hotdog&rdquo; and &ldquo;nothotdog.&rdquo; Just like any other machine
learning system, our system will never be 100% confident with a classification,
so we will need to output confidence probabilities. The output of our system
will be a two-dimensional vector, \(\mathbf{p}\). \(p_0\) will represent
the probability that the input should be labeled &ldquo;hotdog&rdquo; and \(p_1\) will
represent the probability that the input should be labeled &ldquo;nothotdog.&rdquo;</p>
<p>How do we take a vector in 256 (or \(\dim(\mathbf{x})\)) dimensions and
make something in just 2 (or \(\dim(\mathbf{p})\)) dimensions? Why,
<a href="https://en.wikipedia.org/wiki/Matrix_multiplication">matrix multiplication</a>
of course! If you have a matrix with 2 rows and 256 columns,
multiplying it by a 256-dimensional vector will result in a 2-dimensional one.</p>
<p id="math1" class="katex-display">
<script>
katex.render(
"\\left[ \\begin{array}{cccc} "+
"W_{0,0} & W_{0,1} & \\ldots & W_{0,255} \\\\ "+
"W_{1,0} & W_{1,1} & \\ldots & W_{1,255} "+
"\\end{array} \\right] \\cdot "+
"\\left[ \\begin{array}{c} "+
"x_0 \\\\ "+
"x_1 \\\\ "+
"\\vdots \\\\ "+
"x_{255} "+
"\\end{array} \\right] = "+
"\\left[ \\begin{array}{c} "+
"? \\\\ "+
"? "+
"\\end{array} \\right]",
document.getElementById("math1"), {displayMode: true});
</script>
</p>
<p>Surprisingly, this is actually really close to the final construction of our
classifier, but there are two problems:</p>
<ol>
<li>If one of the input \(\mathbf{x}\)s is all zeros, the output will have
to be zeros. But we need one of the output dimensions to not be zero!</li>
<li>There&rsquo;s nothing guaranteeing the probabilities in the output will be
non-negative and all sum to 1.</li>
</ol>
<p>The first problem is easy, we add a bias vector \(\mathbf{b}\), turning our
matrix multiplication into a standard linear equation of the form
\(\mathbf{W}\cdot\mathbf{x}+\mathbf{b}=\mathbf{y}\).</p>
<p id="math2" class="katex-display">
<script>
katex.render(
"\\left[ \\begin{array}{cccc} "+
"W_{0,0} & W_{0,1} & \\ldots & W_{0,255} \\\\ "+
"W_{1,0} & W_{1,1} & \\ldots & W_{1,255} "+
"\\end{array} \\right] \\cdot "+
"\\left[ \\begin{array}{c} "+
"x_0 \\\\ "+
"x_1 \\\\ "+
"\\vdots \\\\ "+
"x_{255} "+
"\\end{array} \\right] + "+
"\\left[ \\begin{array}{c} "+
"b_0 \\\\ "+
"b_1 "+
"\\end{array} \\right] = "+
"\\left[ \\begin{array}{c} "+
"y_0 \\\\ "+
"y_1 "+
"\\end{array} \\right]",
document.getElementById("math2"), {displayMode: true});
</script>
</p>
<p>The second problem can be solved by using the
<a href="https://en.wikipedia.org/wiki/Softmax_function">softmax function</a>. For a given
vector \(\mathbf{v}\), softmax is defined as:</p>
<p id="math3" class="katex-display">
<script>
katex.render(
"\\text{softmax}\\left(\\mathbf{v}\\right) = \\left[ "+
"\\begin{array}{c} "+
"S\\left(v_0\\right) \\\\ "+
"S\\left(v_1\\right) \\\\ "+
"\\vdots \\\\ "+
"S\\left(v_n\\right) "+
"\\end{array} "+
"\\right], \\text{ where }"+
"S\\left(v_i\\right) = "+
"\\frac{e^{v_i}}{\\sum\\limits_{j=0}^{n-1} e^{v_j}} ",
document.getElementById("math3"), {displayMode: true});
</script>
</p>
<p>In case the \(\sum\) scares you, \(\sum_{j=0}^{n-1}\) is basically
a math &ldquo;for loop.&rdquo; All it&rsquo;s saying is that we&rsquo;re going to add together
everything that comes after it (\(e^{v_j}\)) for every \(j\) value from
0 to \(n-1\).</p>
<p>Softmax is a neat function! The output will be a vector where the largest
dimension in the input will be the closest number to 1, no dimensions will be
less than zero, and all dimensions sum to 1. Here are some examples:</p>
<p id="math4" class="katex-display">
<script>
katex.render(
"\\begin{array}{l r}"+
"\\text{softmax}\\left(\\left[ \\begin{array}{c} "+
"5.4 \\\\ 1.3 "+
"\\end{array} \\right]\\right) = \\left[ \\begin{array}{c} "+
"0.983\\ldots \\\\ 0.016\\ldots "+
"\\end{array} \\right] & " +
"\\text{softmax}\\left(\\left[ \\begin{array}{c} "+
"2 \\\\ 2 "+
"\\end{array} \\right]\\right) = \\left[ \\begin{array}{c} "+
"0.5 \\\\ 0.5 "+
"\\end{array} \\right] \\end{array}" ,
document.getElementById("math4"), {displayMode: true});
</script>
</p>
<p>Unbelievably, these are all the building blocks you need for a linear model!
Let&rsquo;s put all the blocks together. If you already have
\(\mathbf{W}\cdot\mathbf{x}+\mathbf{b}=\mathbf{y}\), your prediction
\(\mathbf{p}\) can be found as \(\text{softmax}\left(\mathbf{y}\right)\).
More fully, given an input \(\mathbf{x}\) and a trained model
\(\left(\mathbf{W},\mathbf{b}\right)\), your prediction \(\mathbf{p}\) is:</p>
<p id="math5" class="katex-display">
<script>
katex.render(
"\\text{softmax}\\left(\\mathbf{W}\\cdot\\mathbf{x} + \\mathbf{b}\\right) = "+
"\\mathbf{p}",
document.getElementById("math5"), {displayMode: true});
</script>
</p>
<p>Once again, in this context, \(p_0\) is the probability given the model that
the input should be labeled &ldquo;hotdog&rdquo; and \(p_1\) is the probability given the
model that the input should be labeled &ldquo;nothotdog.&rdquo;</p>
<p>It&rsquo;s kind of amazing that all you need for good success with things even as
complex as handwriting recognition is a linear model such as this one.</p>
<h3 id="scoring:0f4d8484fd7f999da0a9c6d7316f575e">Scoring</h3>
<p>How do we find \(\mathbf{W}\) and \(\mathbf{b}\)? It might surprise you but
we&rsquo;re going to start off by guessing some random numbers and then changing
them until we aren&rsquo;t predicting things too badly (via a process known as
<a href="https://en.wikipedia.org/wiki/Gradient_descent">gradient descent</a>). But what
does &ldquo;too badly&rdquo; mean?</p>
<p>Recall that we have data that we&rsquo;ve already labeled. We already have photos
labeled &ldquo;hotdog&rdquo; and &ldquo;nothotdog&rdquo; in what&rsquo;s called our <em>training set</em>. For
each photo, we&rsquo;re going to take whatever our current model is
(\(\mathbf{W}\) and \(\mathbf{b}\)) and find \(\mathbf{p}\). Perhaps
for one photo (that really is of a hot dog) our \(\mathbf{p}\) looks like
this:</p>
<p id="math6" class="katex-display">
<script>
katex.render(
"\\begin{array}{l r}"+
"\\mathbf{p}=\\left[ \\begin{array}{c} "+
"0.4 \\\\ 0.6 "+
"\\end{array} \\right] & "+
"\\text{ the truth }=\\left[ \\begin{array}{c} "+
"1 \\\\ 0 "+
"\\end{array} \\right] \\end{array}",
document.getElementById("math6"), {displayMode: true});
</script>
</p>
<p>This isn&rsquo;t great! Our model says that the photo should be labeled &ldquo;nothotdog&rdquo;
with 60% probability, but it is a hot dog.</p>
<p>We need a bit more terminology. So far, we&rsquo;ve only talked about one
sample, one label, and one prediction at a time, but obviously we have lots of
samples, lots of labels, and lots of predictions, and we want to score how our
model does not just on one sample, but on all of our training samples. Assume
we have \(s\) training samples, each sample has \(d\) dimensions, and there
are \(l\) labels. In the case of our 16 by 16 pixel hot dog photos,
\(d = 256\) and \(l = 2\). We&rsquo;ll refer to sample \(i\) as
\(\mathbf{x}^{(i)}\), our prediction for sample \(i\) as
\(\mathbf{p}^{(i)}\), and the correct label vector for sample \(i\) as
\(\mathbf{L}^{(i)}\). \(\mathbf{L}^{(i)}\) is a vector that is all zeros
except for the dimension corresponding to the correct label, where that
dimension is a 1. In other words, we have
\(\mathbf{W}\cdot\mathbf{x}^{(i)}+\mathbf{b} = \mathbf{p}^{(i)}\) and we want
\(\mathbf{p}^{(i)}\) to be as close to \(\mathbf{L}^{(i)}\) as possible,
for all \(s\) samples.</p>
<p>To score our model, we&rsquo;re going to compute something called the <em>average cross
entropy loss</em>. In general, <a href="https://en.wikipedia.org/wiki/Loss_function">loss</a>
is used to mean how off the mark a machine learning model is. While there are
many ways of calculating loss, we&rsquo;re going to use average
<a href="https://en.wikipedia.org/wiki/Cross_entropy">cross entropy</a> because it has
some nice properties.</p>
<p>Here&rsquo;s the definition of the average cross entropy loss across all samples:</p>
<p id="math-650" class="katex-display">
<script>
katex.render(
"\\text{loss} = "+
"-\\frac{1}{s} \\cdot \\sum_{i=0}^{s-1} \\sum_{j=0}^{l-1} "+
"L_j^{(i)} \\cdot \\log \\left( p_j^{(i)} \\right)",
document.getElementById("math-650"), {displayMode: true});
</script>
</p>
<p>All we need to do is find \(\mathbf{W}\) and \(\mathbf{b}\) that make this
loss smallest. How do we do that?</p>
<h3 id="training:0f4d8484fd7f999da0a9c6d7316f575e">Training</h3>
<p>As we said before, we will start \(\mathbf{W}\) and \(\mathbf{b}\) off with
random values. For each value, choose a floating-point random number between -1
and 1.</p>
<p>Of course, we&rsquo;ll need to correct these values given the training data, and we
now have enough information to describe how we will back-propagate corrections.</p>
<p>The plan is to process all of the training data enough times that the loss
drops to an &ldquo;acceptable level.&rdquo; Each time through the training data we&rsquo;ll
collect all of the predictions, and at the end we&rsquo;ll update \(\mathbf{W}\)
and \(\mathbf{b}\) with the information we&rsquo;ve found.</p>
<p>One problem that can occur is that your model might overcorrect after each run.
A simple way to limit overcorrection some is to add a &ldquo;learning rate&rdquo;, usually
designated \(\alpha\), which is some small fraction. You get to choose the
learning rate! A good default choice for \(\alpha\) is 0.1.</p>
<p>At the end of each run through all of the training data, here&rsquo;s how you update
\(\mathbf{W}\) and \(\mathbf{b}\):</p>
<p id="math-700" class="katex-display">
<script>
katex.render(
"\\begin{array}{l l l} "+
"W_{m,n} & \\mathrel{-}=& \\frac{\\alpha}{s}\\cdot "+
"\\sum\\limits_{i=0}^{s-1} x_n^{(i)} \\cdot "+
"\\left( p_m^{(i)} - L_m^{(i)} \\right) "+
"\\\\\\\\ "+
"b_m & \\mathrel{-}=& \\frac{\\alpha}{s}\\cdot "+
"\\sum\\limits_{i=0}^{s-1} p_m^{(i)} - L_m^{(i)} "+
"\\end{array}",
document.getElementById("math-700"), {displayMode: true});
</script>
</p>
<p>Just because this syntax is starting to get out of hand, let&rsquo;s refresh what
each symbol means.</p>
<ul>
<li>\(W_{m,n}\) is the cell in weight matrix \(\mathbf{W}\) at row \(m\)
and column \(n\).</li>
<li>\(b_m\) is the \(m\)-th dimension in the &ldquo;bias&rdquo; vector \(\mathbf{b}\).</li>
<li>\(\alpha\) is again your learning rate, 0.1, and \(s\) is how many
training samples you have.</li>
<li>\(x_n^{(i)}\) is the \(n\)-th dimension of sample \(i\).</li>
<li>Likewise, \(p_m^{(i)}\) and \(L_m^{(i)}\) are the \(m\)-th dimensions
of our prediction and true labels for sample \(i\), respectively.
Remember that for each sample \(i\), \(L_m^{(i)}\) is zero for all but
the dimension corresponding to the correct label, where it is 1.</li>
</ul>
<p>If you&rsquo;re curious how we got these equations, we applied the
<a href="https://en.wikipedia.org/wiki/Chain_rule">chain rule</a> to calculate partial
derivatives of the total loss. It&rsquo;s hairy, and this problem description is
already too long!</p>
<p>Anyway, once you&rsquo;ve updated your \(\mathbf{W}\) and \(\mathbf{b}\), you
start the whole process over!</p>
<h3 id="when-do-we-stop:0f4d8484fd7f999da0a9c6d7316f575e">When do we stop?</h3>
<p>Knowing when to stop is a hard problem. How low your loss goes is a function
of your learning rate, how many iterations you run over your training data,
and a huge number of other factors. On the flip side, if you train your model
so your loss is too low, you run the risk of overfitting your model to your
training data, so it won&rsquo;t work well on data it hasn&rsquo;t seen before.</p>
<p>One of the more common ways of deciding when to
<a href="https://en.wikipedia.org/wiki/Early_stopping">stop training</a> is to have a
separate validation set of samples we check our success on and stop when
we stop improving. But for this problem, to keep things simple what we&rsquo;re going
to do is just keep track of how our loss changes and stop when the loss stops
changing as much.</p>
<p>After the first 10 iterations, your loss will have changed 9 times (there was
no change from the first time since it was the first time). Take the average
of those 9 changes and stop training when your loss change is less than a
hundredth the average loss change.</p>
<h3 id="tie-it-all-together:0f4d8484fd7f999da0a9c6d7316f575e">Tie it all together</h3>
<p>Alright! If you&rsquo;ve stuck with me this far, you&rsquo;ve learned to implement a
multinomial logistic classifier using gradient descent,
<a href="https://en.wikipedia.org/wiki/Backpropagation">back-propagation</a>, and
<a href="https://en.wikipedia.org/wiki/One-hot">one-hot encoding</a>. Good job!</p>
<p>You should now be able to write a program that takes labeled training samples,
trains a model, then takes unlabeled test samples and predicts labels for them!</p>
<h3 id="your-program:0f4d8484fd7f999da0a9c6d7316f575e">Your program</h3>
<p>As input your program should take vectors of floating-point values, followed by
a label. Some of the labels will be question marks. Your program should output
the correct label for all of the question marks it sees. The label your program
should output will always be one it has seen training examples of.</p>
<p>Your program will pass the tests if it labels 75% or more of the unlabeled data
correctly.</p>
<h3 id="where-to-learn-more:0f4d8484fd7f999da0a9c6d7316f575e">Where to learn more</h3>
<p>If you want to learn more or dive deeper into optimizing your solution, you
may be interested in the first section of
<a href="https://classroom.udacity.com/courses/ud730">Udacity&rsquo;s free course on Deep Learning</a>,
or <a href="http://nbviewer.jupyter.org/github/domluna/labs/blob/master/Build%20Your%20Own%20TensorFlow.ipynb">Dom Luma&rsquo;s tutorial on building a mini-TensorFlow</a>.</p>
<h3 id="example:0f4d8484fd7f999da0a9c6d7316f575e">Example</h3>
<h4 id="input:0f4d8484fd7f999da0a9c6d7316f575e">Input</h4>
<pre><code> 0.93 -1.52 1.32 0.05 1.72 horse
1.57 -1.74 0.92 -1.33 -0.68 staple
0.18 1.24 -1.53 1.53 0.78 other
1.96 -1.29 -1.50 -0.19 1.47 staple
1.24 0.15 0.73 -0.22 1.15 battery
1.41 -1.56 1.04 1.09 0.66 horse
-0.70 -0.93 -0.18 0.75 0.88 horse
1.12 -1.45 -1.26 -0.43 -0.05 staple
1.89 0.21 -1.45 0.47 0.62 other
-0.60 -1.87 0.82 -0.66 1.86 staple
-0.80 -1.99 1.74 0.65 1.46 horse
-0.03 1.35 0.11 -0.92 -0.04 battery
-0.24 -0.03 0.58 1.32 -1.51 horse
-0.60 -0.70 1.61 0.56 -0.66 horse
1.29 -0.39 -1.57 -0.45 1.63 staple
0.87 1.59 -1.61 -1.79 1.47 battery
1.86 1.92 0.83 -0.34 1.06 battery
-1.09 -0.81 1.47 1.82 0.06 horse
-0.99 -1.00 -1.45 -1.02 -1.06 staple
-0.82 -0.56 0.82 0.79 -1.02 horse
-1.86 0.77 -0.58 0.82 -1.94 other
0.15 1.18 -0.87 0.78 2.00 other
1.18 0.79 1.08 -1.65 -0.73 battery
0.37 1.78 0.01 0.06 -0.50 other
-0.35 0.31 1.18 -1.83 -0.57 battery
0.91 1.14 -1.85 0.39 0.07 other
-1.61 0.28 -0.31 0.93 0.77 other
-0.11 -1.75 -1.66 -1.55 -0.79 staple
0.05 1.03 -0.23 1.49 1.66 other
-1.99 0.43 -0.99 1.72 0.52 other
-0.30 0.40 -0.70 0.51 0.07 other
-0.54 1.92 -1.13 -1.53 1.73 battery
-0.52 0.44 -0.84 -0.11 0.10 battery
-1.00 -1.82 -1.19 -0.67 -1.18 staple
-1.81 0.10 -1.64 -1.47 -1.86 battery
-1.77 0.53 -1.28 0.55 -1.15 other
0.29 -0.28 -0.41 0.70 1.80 horse
-0.91 0.02 1.60 -1.44 -1.89 battery
1.24 -0.42 -1.30 -0.80 -0.54 staple
-1.98 -1.15 0.54 -0.14 -1.24 staple
1.26 -1.02 -1.08 -1.27 1.65 ?
1.97 1.14 0.51 0.96 -0.36 ?
0.99 0.14 -0.97 -1.90 -0.87 ?
1.54 -1.83 1.59 1.98 -0.41 ?
-1.81 0.34 -0.83 0.90 -1.60 ?
</code></pre>
<h4 id="output:0f4d8484fd7f999da0a9c6d7316f575e">Output</h4>
<pre><code>staple
other
battery
horse
other
</code></pre>
A Hitchhiker's Guide to Distributed Systemshttps://www.jtolio.com/2018/03/a-hitchhikers-guide-to-distributed-systems
Mon, 26 Mar 2018 12:00:00 -0700hello@jtolio.com (JT Olio)https://www.jtolio.com/2018/03/a-hitchhikers-guide-to-distributed-systems
<p><em><a href="https://blog.storj.io/post/172275100478/a-hitchhikers-guide-to-distributed-systems">This article was originally posted on the Storj Labs
blog.</a></em></p>
<p>Hello, I&rsquo;m JT Olio! I&rsquo;m excited to have the opportunity to introduce myself to
you as Storj&rsquo;s new Director of Engineering. This will be my fourth time
building a distributed storage system, and each time I&rsquo;ve been faced with new
challenges and learned new things.</p>
<p>Distributed systems are so cool. They&rsquo;re a complex intersection of security,
mechanism design, performance, game theory, engineering, distributed systems
research, economics, and so much more. If you like working on hard problems,
the amount of puzzles to solve is fractal and never ending. Each system I&rsquo;ve
worked on has been like scaling a new mountain, and I&rsquo;m especially excited for
this latest one.</p>
<h2 id="from-mozy-to-space-monkey-and-beyond:ee3f07b99017f3fa20ca701de18edd42">From Mozy to Space Monkey and beyond</h2>
<p>I cut my teeth on distributed systems starting in 2005 at
<a href="https://mozy.com/">Mozy</a>, one of the
original online backup services. In a world before Amazon Web Services, selling
people on cloud-hosted backup was hard! Potential sales partners were extremely
skeptical about letting some other company manage all of their most important
data.</p>
<p>Because cloud platform providers like Amazon Web Services and Google Cloud
Platform didn&rsquo;t exist yet, we had no choice but to manage our own data centers.
And as we quickly grew into petabytes and petabytes of data, we hit smack into
a really interesting problem. Suppose you have a data center full of hard drives
with a five year mean time between failures and a 0.05 percent daily failure
probability. If you have 10,000 hard drives, five drives will fail every day.
A common solution is to have a large staff of data center operations folks
running around and replacing drives, but this is costly in a number of ways.</p>
<p>At Mozy, we were able to manage huge data centers with skeleton crews because
of architectural decisions we made. Data came in, we split it up with erasure
encoding, and then distributed those shards to storage nodes in our data center.
If a drive failed, it was no big deal! There was no urgency to replace it as
the distributed network we built managed the hardware failure for us. If this
sounds familiar, yep! Mozy and Storj have surprisingly similar architectures.</p>
<p>My next distributed storage gig was at
<a href="https://www.spacemonkey.com/">Space Monkey</a>, where I was the second
employee hired. In 2012, Space Monkey <a href="http://launch.co/blog/launch-festival-2012-winners.html">won best startup at the LAUNCH
Festival</a>
(yep! That&rsquo;s me in the very center). In 2013, we launched a <a href="https://www.kickstarter.com/projects/clintgc/space-monkey-taking-the-cloud-out-of-the-datacente">successful
Kickstarter</a>
(back in the days before token sales), and in 2014, we were acquired
by Vivint Smart Home. At Space Monkey, we created a distributed object storage
layer across every continent except Antarctica by linking our Space Monkey
hardware together via a state of the art protocol of our own creation. We
supported low overhead, high throughput, structured streaming object storage
for an active user base on a globally distributed storage system.</p>
<p>The visionaries at <a href="https://www.vivint.com/">Vivint Smart Home</a> saw our
potential, as the amount of storage
demand they had was increasing dramatically. After our acquisition, we soon
pivoted toward video storage. We took the lessons we learned from the Space
Monkey file storage product and made a new home security video storage platform.
I&rsquo;m not sure I can share full numbers, but let&rsquo;s just say Vivint gave our
distributed system 70 times larger scale. Our distributed streaming video
storage system now powers nearly all home security video clips that Vivint&rsquo;s
home security system records, with enough incoming video data that Vivint
stores significantly more footage than YouTube does every second.</p>
<h2 id="why-storj:ee3f07b99017f3fa20ca701de18edd42">Why Storj</h2>
<p>This history brings us to today, the first day of my second week as Director of
Engineering at Storj. I am so lucky to have had these past experiences, and
can&rsquo;t wait to take some of the lessons I&rsquo;ve learned from these previous projects
and bring them to the masses. I&rsquo;m excited about where Storj is today but I&rsquo;m
even more excited about getting to join the ride to
<a href="https://blog.storj.io/post/169896892413/getting-from-petabytes-to-exabytes-the-road-ahead">exabyte scale</a>.</p>
<p>To get to exabyte scale, we need to have a laser focus on the user experience.
As we move towards mass adoption, we need greater decentralization, top-flight
security, rock-solid reliability, yes, but we also need a dramatic increase in
performance, functionality, and features. We don&rsquo;t want to be the electric car
that people bought 10 years ago just because they were excited about electric
cars, even though they only went five miles. We will be the Tesla of
storage—better because it&rsquo;s decentralized, but better in all the other ways,
too.</p>
<p>One obvious difference between Storj and my past gigs is our token economy.
Tokens allow us to pay farmers to innovate on finding the most efficient storage
solution. With Space Monkey, we had dedicated hardware that fixed the costs of
what farmers could do. With Storj, we&rsquo;re essentially paying people to come up
with the most efficient ways to store data and they&rsquo;re massively incentivized
to do just that. We probably could have built a system like Storj using Chuck
E. Cheese tokens or U.S. dollars, but the cool thing about having a utility
token is that as the value of our marketplace rises, the more valuable our
token is. It&rsquo;s a cool way to give back to the people who believed in us in the
beginning.</p>
<p>As Director of Engineering I wear a number of hats. With my more than decade&rsquo;s
worth of production distributed systems experience, I&rsquo;m of course always
excited to dive deep into our technical details as needed, but fundamentally,
being Director of Engineering really means my top priority is our people. Shawn
and company have built a fantastic team and it is my job to shine the spotlight
on them and block and tackle obstacles in the way. In my view, managers might
fail, but teams succeed. I will count myself lucky if I get to say I helped in
getting this excellent team to our upcoming position of dominance.</p>
<p>You&rsquo;ll be hearing more from me soon about what the team has cooking, but until
then, if you&rsquo;re passionate about distributed systems and solving hard problems,
come join us! If you&rsquo;re ever in Salt Lake City, make sure to hit up our
<a href="https://www.meetup.com/Utah-Distributed-Systems-Meetup-and-Reading-Group/">Distributed Systems Meetup and Reading Group</a>!</p>
How to not be a white male asshole, by a former offenderhttps://www.jtolio.com/2018/03/how-to-not-be-a-white-male-asshole-by-a-former-offender
Sun, 18 Mar 2018 17:26:00 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2018/03/how-to-not-be-a-white-male-asshole-by-a-former-offender
<p><em>Huge thanks to Caitlin Jarvis for editing, contributing to, and proofreading
to this post.</em></p>
<p>First off, let&rsquo;s start off with some assumptions. You, dear reader, don&rsquo;t
intend to cause anyone harm. You have good intentions, see yourself as a good
person, and are interested in self improvement. That&rsquo;s great!</p>
<p>Second, I don&rsquo;t actually know for sure if I&rsquo;m not still a current offender.
I might be! It&rsquo;s certainly something I&rsquo;ll never be done working on.</p>
<h2 id="1-you-don-t-know-what-others-are-going-through:18324069561269a4d9e24bcc6a5c1ff9">1. You don&rsquo;t know what others are going through</h2>
<p>Unfortunately, your good intentions are not enough to make sure the experiences
of others are, in fact, good because we live in a world of asymmetric
information. If another person&rsquo;s dog just died unbeknownst to you and you start
talking excitedly about how great dogs are to try and cheer a sad person up, you
may end up causing them to be even sadder. You know things other people don&rsquo;t,
and others know things you don&rsquo;t.</p>
<p>So when I say that if you are a white man, there is an invisible world of
experiences happening all around you that you are inherently blind to, it&rsquo;s
because of asymmetric information. You can&rsquo;t know what others are going through
because you are not an impartial observer of a system. <em>You exist within the
system.</em></p>
<p><div style="text-align: center; padding: 20px;">
<img src="https://www.jtolio.com/images/mrmouse.jpg" width="500" margin="auto">
</div></p>
<p>Let me show you what I mean: did you know a recent survey found that
<em><a href="https://www.npr.org/sections/thetwo-way/2018/02/21/587671849/a-new-survey-finds-eighty-percent-of-women-have-experienced-sexual-harassment">81 percent of women have experienced sexual
harassment of some kind</a></em>?
Fully 1 out of every 2 women you know have had to deal specifically with
<em>unwanted sexual touching</em>.</p>
<p>What should have been most amazing about the
<a href="https://en.wikipedia.org/wiki/Me_Too_movement">#MeToo movement</a> was not how
many women reported harassment, but how many men were surprised.</p>
<h2 id="2-you-can-inadvertently-contribute-to-a-racist-sexist-or-prejudiced-society:18324069561269a4d9e24bcc6a5c1ff9">2. You can inadvertently contribute to a racist, sexist, or prejudiced society</h2>
<p>I <a href="https://www.jtolio.com/2015/03/what-riding-a-unicycle-can-teach-us-about-microaggressions/">previously wrote a lot about how small little interactions can add
up</a>,
illustrating that even if you don&rsquo;t intend to subject someone to racism, sexism,
or some other prejudice, you might be doing it anyway. Intentions are
meaningless when your actions amplify the negative experience of someone else.</p>
<p>An example from <a href="https://everydayfeminism.com/2015/09/dont-touch-black-womens-hair/">Maisha Johnson in
Everyday Feminism</a>:</p>
<blockquote>
<p>Black women deal with people touching our hair a lot. Now you know. Okay,
there&rsquo;s more to it than that: Black women deal with people touching our hair
a <em>hell</em> of a lot.</p>
<p>If you approach a Black woman saying &ldquo;I just have to feel your hair,&rdquo; it&rsquo;s
pretty safe to assume this isn&rsquo;t the first time she&rsquo;s heard that.</p>
<p>Everyone who asks me if they can touch follows a long line of people
othering me &ndash; including strangers who touch my hair without asking. The
psychological impact of having people constantly feel entitled my personal
space has worn me down.</p>
</blockquote>
<p>Another example is that men frequently demand proof. Even though it makes sense
in general to check your sources for something, the predominant response of men
when confronted with claims of sexist treatment is to <a href="https://twitter.com/ArielDumas/status/970692180766490630">ask for
evidence</a>. Because
this happens so frequently, this action <em>itself</em> contributes to the sexist
subjugation of women. The parallel universe women live in is so distinct from
the experiences of men that men can&rsquo;t believe their ears, and treat the report
of a victim with skepticism.</p>
<p>As you might imagine, this sort of effect is not limited to asking women for
evidence or hair touching. Microaggressions are real and everywhere; the
accumulation of lots of small things can be enormous.</p>
<p>If you&rsquo;re someone in charge of building things, this can be even more important
and an even greater responsibility. If you build an app that is blind to the
experiences of people who don&rsquo;t look or act like you, you can significantly
amplify negative experiences for others by causing systemic and system-wide
issues.</p>
<h2 id="3-the-only-way-to-stop-contributing-is-to-continually-listen-to-others:18324069561269a4d9e24bcc6a5c1ff9">3. The only way to stop contributing is to continually listen to others</h2>
<p>If you don&rsquo;t already know what others are going through, and by not knowing what
others are going through you may be subjecting them to prejudice even if you
don&rsquo;t mean to, what can you do to help others avoid prejudice? You can listen to
them! People who are experiencing prejudice <em>don&rsquo;t want to be experiencing
prejudice</em> and tend to be vocal about the experience. It is your job to really
listen and then turn around and change the way you approach these situations in
the future.</p>
<h2 id="4-how-do-i-listen:18324069561269a4d9e24bcc6a5c1ff9">4. How do I listen?</h2>
<p>To listen to someone, you need to have empathy. You need to actually care about
them. You need to process what they&rsquo;re saying and not treat them with suspicion.</p>
<p>Listening is very different from interjecting and arguing. Listening to others
is different from making them do the work to educate you. It is your job to find
the experiences of others you haven&rsquo;t had and learn from them without demanding
a curriculum.</p>
<p>When people say you should just believe marginalized people, <a href="https://www.elle.com/culture/career-politics/a13977980/me-too-movement-false-accusations-believe-women/">no one is asking
you to check your critical thinking at the door</a>.
What you&rsquo;re being asked to do is to be aware that your incredulity is a further
reminder that you are not experiencing the same thing. Worse - white men acting incredulous is <em>so unbelievably common</em> that it itself is a microaggression.
Don&rsquo;t be a sea lion:</p>
<p><div style="text-align: center; padding: 20px;">
<img src="https://www.jtolio.com/images/sealion.png" width="600" margin="auto">
</div></p>
<h3 id="aside-about-diversity-of-experience-vs-diversity-of-thought:18324069561269a4d9e24bcc6a5c1ff9">Aside about diversity of experience vs. diversity of thought.</h3>
<p>When trying to find others to listen to, who should you find? Recently, a
growing number of people have echoed that all that&rsquo;s really required of
diversity is different viewpoints, and having diversity of thought is the
ultimate goal.</p>
<p>I want to point out that this is not the kind of diversity that will be useful
to you. It&rsquo;s easy to have a bunch of different opinions and then reject them
when they complicate your life. What you want to be listening to is diversity
of <em>experience</em>. Some experiences can&rsquo;t be chosen. You can choose to be
contrarian, but you can&rsquo;t choose the color of your skin.</p>
<h2 id="5-where-do-i-listen:18324069561269a4d9e24bcc6a5c1ff9">5. Where do I listen?</h2>
<p>What you need is a way to be a fly on the wall and observe the life experiences
of others through their words and perspectives. Being friends and hanging out
with people who are different from you is great. Getting out of monocultures is
fantastic. Holding your company to diversity and inclusion initiatives is
wonderful.</p>
<p>But if you still need more or you live somewhere like Utah?</p>
<p>What if there was a website where people from all walks of life opted in to
talking about their day and what they&rsquo;re feeling and experiencing from their
viewpoint in a way you could read? It&rsquo;d be almost like seeing the world through
their eyes.</p>
<p>Yep, this blog post is an unsolicited Twitter ad. Twitter definitely has its
share of problems, but after <a href="https://www.jtolio.com/2009/03/i-finally-figured-out-twitter/">writing about how I finally figured out
Twitter</a>, in 2014 I decided to embark
on a year-long effort to use Twitter (I wasn&rsquo;t really using it before) to
follow mostly women or people of color in my field and just see what the field
is like for them on a day to day basis.</p>
<p>Listening to others in this way blew my mind clean open. Suddenly I was aware
of this invisible world around me, much of which is still invisible. Now, I&rsquo;m
looking for it, and I catch glimpses. I would challenge anyone and everyone to
do this. Make sure the content you&rsquo;re consuming is predominantly viewpoints
from life experiences you haven&rsquo;t had.</p>
<p>If you need a start, here are some links to accounts to fill your Twitter feed
up with:</p>
<ul>
<li><a href="http://peopleofcolorintech.com/articles/a-list-of-200-women-of-color-on-twitter/">200 Women of Color in Tech on Twitter</a></li>
<li><a href="https://github.com/ryanburgess/female-engineers-twitter">Women Engineers on Twitter</a></li>
</ul>
<p>You can also check out <a href="https://twitter.com/jtolds/following">who I follow</a>,
though I should warn I also follow a lot of political accounts, joke accounts,
and my following of someone is not an endorsement.</p>
<p>It&rsquo;s also worth pointing out that no individual can possibly speak for an
entire class of people, but if 38 out of 50 women are saying they&rsquo;re dealing
with something, you should listen.</p>
<h2 id="6-does-this-work:18324069561269a4d9e24bcc6a5c1ff9">6. Does this work?</h2>
<p>Listening to others works, but you don&rsquo;t have to just take my word for it. Here
are two specific and recent experience reports of people turning their
worldview for the better by listening to others:</p>
<ul>
<li><a href="https://www.theglobeandmail.com/opinion/ill-start-2018-by-recognizing-my-white-privilege/article37472875/">A professor at the University of New Brunswick</a></li>
<li><a href="https://micahgodbolt.com/blog/changing-your-worldview/">A senior design developer at Microsoft</a></li>
</ul>
<p>You can see how much of a profound and fast impact this had on me because by
early 2015, only a few months into my Twitter experiment, I was worked up enough
to write <a href="https://www.jtolio.com/2015/03/what-riding-a-unicycle-can-teach-us-about-microaggressions/">my unicycle post</a>
in response to what I was reading on Twitter.</p>
<p>Having diverse perspectives in a workplace has even been shown to
<a href="http://edis.ifas.ufl.edu/hr022">increase productivity</a> and
<a href="https://faculty.insead.edu/william-maddux/documents/PSPB-learning-paper.pdf">increase creativity</a>.</p>
<h2 id="7-don-t-stop-there:18324069561269a4d9e24bcc6a5c1ff9">7. Don&rsquo;t stop there!</h2>
<p>Not everyone is as growth-oriented as you. Just because you&rsquo;re listening now
doesn&rsquo;t mean others are hearing the same distribution of experiences.</p>
<p>If this is new to you, it&rsquo;s not new to marginalized people. Imagine how tired
they must be in trying to convince everyone their experiences are real, valid,
and ongoing. Help get the word out! Repeat and retweet what women and minorities
say. Give them credit. In meetings at your work, give credit to others for their
ideas and amplify their voices.</p>
<p>Did you know that <a href="https://digest.bps.org.uk/2017/07/12/non-white-or-female-bosses-who-push-diversity-are-judged-negatively-by-their-peers-and-managers/">non-white or female bosses who push diversity are judged
negatively by their peers and managers</a> but white male bosses are not? If you&rsquo;re a white male, use your
position where others can&rsquo;t.</p>
<p>If you need an example list of things your company can do, <a href="https://www.susanjfowler.com/blog/2017/5/20/five-things-tech-companies-can-do-better">here&rsquo;s a list
Susan Fowler wrote after her experience at Uber</a>.</p>
<p>Speak up, use your experiences to help others.</p>
<h2 id="8-am-i-not-prejudiced-now:18324069561269a4d9e24bcc6a5c1ff9">8. Am I not prejudiced now?</h2>
<p>The asymmetry of experiences we all have means we&rsquo;re all inherently prejudiced
to some degree and will likely continue to contribute to a prejudiced society.
That said, the first step to fixing it is admitting it!</p>
<p>There will always be work to do. You will always need to keep listening, keep
learning, and work to improve every day.</p>
Distributed System Talkshttps://www.jtolio.com/2018/02/distributed-system-talks
Sun, 25 Feb 2018 21:20:00 -0700hello@jtolio.com (JT Olio)https://www.jtolio.com/2018/02/distributed-system-talks<p>From 2014 until 2016, I ran the <a href="https://www.meetup.com/Utah-Distributed-Systems-Meetup-and-Reading-Group/">Utah Distributed Systems Meetup and
Reading Group</a>.</p>
<p>Sometimes I did a bad job at finding a better presenter and
ended up covering topics myself. Even though we <a href="https://www.youtube.com/playlist?list=PL6Bpysr3CEPQaVx6EO7g_FXJ-Mhv5jqVw">often recorded
video</a>,
all I have from my presentations are slide decks.</p>
<p>So, I thought they might be worth sharing:</p>
<ul>
<li><a href="https://www.jtolio.com/talks/raft.pdf">JT presents the Raft Consensus protocol</a></li>
<li><a href="https://www.jtolio.com/talks/kad.pdf">JT presents the Kademlia DHT</a></li>
<li><a href="https://www.jtolio.com/talks/metrics.pdf">JT presents an introduction to metrics and tracing</a></li>
<li><a href="https://www.jtolio.com/talks/mapreduce.pdf">JT presents MapReduce and Spark</a></li>
</ul>
Introduction to Reed-Solomonhttps://www.jtolio.com/2017/08/introduction-to-reed-solomon
Wed, 02 Aug 2017 10:00:00 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2017/08/introduction-to-reed-solomon<p>With my friend Jeff, we recently co-wrote two blog posts over on Vivint&rsquo;s
Innovation Blog (of which I am also the tech editor):</p>
<ul>
<li><a href="https://innovation.vivint.com/introduction-to-reed-solomon-bc264d0794f8">Introduction to Reed-Solomon</a></li>
<li><a href="https://medium.com/@jtolds/joseph-louis-lagrange-and-the-polynomials-499cf0742b39">Joseph Louis Lagrange and the Polynomials</a></li>
</ul>
<p>The first one got over 10k readers in a day!</p>
Sheepdahttps://www.jtolio.com/2017/03/sheepda
Mon, 20 Mar 2017 12:31:00 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2017/03/sheepda
<p>I wrote a <a href="https://en.wikipedia.org/wiki/Lambda_calculus">lambda calculus</a>
interpreter called Sheepda, ha, ha.</p>
<p>It&rsquo;s written in Go but thanks to GopherJS it can be embedded in websites, such
as <a href="https://jtolds.github.io/sheepda/">this one</a>.</p>
<h3 id="project-links:cb2727ef9fdb5696d6eb9abbd33cd9c3">Project links</h3>
<ul>
<li><a href="https://www.jtolio.com/writing/2017/03/whiteboard-problems-in-pure-lambda-calculus/">Blog post describing lambda calculus with sheepda</a></li>
<li><a href="https://github.com/jtolds/sheepda">GitHub repo</a></li>
<li><a href="https://jtolds.github.io/sheepda/">Web playground</a></li>
</ul>
Whiteboard problems in pure Lambda Calculushttps://www.jtolio.com/2017/03/whiteboard-problems-in-pure-lambda-calculus
Sun, 19 Mar 2017 21:31:00 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2017/03/whiteboard-problems-in-pure-lambda-calculus
<p>My team at <a href="https://www.vivint.com/">Vivint</a>, the
<a href="https://www.spacemonkey.com/">Space Monkey</a> group, stopped doing
whiteboard interviews a while ago. We certainly used to do them, but
we&rsquo;ve transitioned to homework problems or actually just hiring a candidate as
a short term contractor for a day or two to solve real work problems and see
how that goes. Whiteboard interviews are kind of like
<a href="https://en.wikipedia.org/wiki/Festivus">Festivus</a> but in a bad way: you get
the feats of strength and then the airing of grievances. Unfortunately, modern
programming is nothing like writing code in front of a roomful of strangers
with only a whiteboard and a marker, so it&rsquo;s probably not best to optimize for
that.</p>
<p>Nonetheless, <a href="https://twitter.com/aphyr">Kyle</a>&rsquo;s recent (wonderful, amazing)
post titled <a href="https://aphyr.com/posts/340-acing-the-technical-interview">acing the technical
interview</a>
got me thinking about fun ways to approach whiteboard problems as an
interviewee. Kyle&rsquo;s
<a href="https://en.wikipedia.org/wiki/Church_encoding">Church-encodings</a> made me
wonder how many &ldquo;standard&rdquo; whiteboard problems you could solve in pure
lambda calculus. If this isn&rsquo;t seen as a feat of strength by your interviewers,
there will certainly be some airing of grievances.</p>
<p>➡️️ <strong>Update</strong>: I&rsquo;ve made a lambda calculus web playground so you can run lambda
calculus right in your browser! I&rsquo;ve gone through and made links to examples
in this post with it. Check it out at
<a href="https://jtolds.github.io/sheepda/">https://jtolds.github.io/sheepda/</a></p>
<h2 id="lambda-calculus:ce1f6433c81ef1ac55c7793a299a68cb">Lambda calculus</h2>
<p>Wait, what is lambda calculus? Did I learn that in high school?</p>
<p>Big-C &ldquo;Calculus&rdquo; of course usually refers to derivatives, integrals, Taylor
series, etc. You might have learned about Calculus in high school, but this
isn&rsquo;t that.</p>
<p>More generally, a little-c &ldquo;calculus&rdquo; is really just any system of calculation.
The <a href="https://en.wikipedia.org/wiki/Lambda_calculus">lambda calculus</a> is
essentially a formalization of the smallest set of primitives needed to make a
completely <a href="https://en.wikipedia.org/wiki/Turing_completeness">Turing-complete</a>
programming language. Expressions in the language can only be one of three things.</p>
<ul>
<li>An expression can define a function that takes exactly one argument
(no more, no less) and then has another expression as the body.</li>
<li>An expression can call a function by applying two subexpressions.</li>
<li>An expression can reference a variable.</li>
</ul>
<p>Here is the entire grammar:</p>
<pre><code>&lt;expr&gt; ::= &lt;variable&gt;
| `λ` &lt;variable&gt; `.` &lt;expr&gt;
| `(` &lt;expr&gt; &lt;expr&gt; `)`
</code></pre>
<p>That&rsquo;s it. There&rsquo;s nothing else you can do. There are no numbers, strings,
booleans, pairs, structs, anything. Every value is a function that takes one
argument. All variables refer to these functions, and all functions can do
is return another function, either directly, or by calling yet another
function. There&rsquo;s nothing else to help you.</p>
<p>To be honest, it&rsquo;s a little surprising that this is even Turing-complete. How
do you do branches or loops or recursion? This seems too simple to work, right?</p>
<p>A common whiteboard problem is the
<a href="https://imranontech.com/2007/01/24/using-fizzbuzz-to-find-developers-who-grok-coding/">fizz buzz problem</a>.
The goal is to write a function that prints out all the numbers from 0 to 100,
but instead of printing numbers divisible by 3 it prints &ldquo;fizz&rdquo;, and instead of
printing numbers divisible by 5 it prints &ldquo;buzz&rdquo;, and in the case of both it
prints &ldquo;fizzbuzz&rdquo;. It&rsquo;s a simple toy problem but it&rsquo;s touted as a good
whiteboard problem because evidently many self-proclaimed programmers can&rsquo;t
solve it. Maybe part of that is cause whiteboard problems suck? I dunno.</p>
<p>Anyway, here&rsquo;s fizz buzz in pure lambda calculus:</p>
<pre><code>(λU.(λY.(λvoid.(λ0.(λsucc.(λ+.(λ*.(λ1.(λ2.(λ3.(λ4.(λ5.(λ6.(λ7.(λ8.(λ9.(λ10.(λnum.(λtrue.(λfalse.(λif.(λnot.(λand.(λor.(λmake-pair.(λpair-first.(λpair-second.(λzero?.(λpred.(λ-.(λeq?.(λ/.(λ%.(λnil.(λnil?.(λcons.(λcar.(λcdr.(λdo2.(λdo3.(λdo4.(λfor.(λprint-byte.(λprint-list.(λprint-newline.(λzero-byte.(λitoa.(λfizzmsg.(λbuzzmsg.(λfizzbuzzmsg.(λfizzbuzz.(fizzbuzz (((num 1) 0) 1)) λn.((for n) λi.((do2 (((if (zero? ((% i) 3))) λ_.(((if (zero? ((% i) 5))) λ_.(print-list fizzbuzzmsg)) λ_.(print-list fizzmsg))) λ_.(((if (zero? ((% i) 5))) λ_.(print-list buzzmsg)) λ_.(print-list (itoa i))))) (print-newline nil)))) ((cons (((num 0) 7) 0)) ((cons (((num 1) 0) 5)) ((cons (((num 1) 2) 2)) ((cons (((num 1) 2) 2)) ((cons (((num 0) 9) 8)) ((cons (((num 1) 1) 7)) ((cons (((num 1) 2) 2)) ((cons (((num 1) 2) 2)) nil))))))))) ((cons (((num 0) 6) 6)) ((cons (((num 1) 1) 7)) ((cons (((num 1) 2) 2)) ((cons (((num 1) 2) 2)) nil))))) ((cons (((num 0) 7) 0)) ((cons (((num 1) 0) 5)) ((cons (((num 1) 2) 2)) ((cons (((num 1) 2) 2)) nil))))) λn.(((Y λrecurse.λn.λresult.(((if (zero? n)) λ_.(((if (nil? result)) λ_.((cons zero-byte) nil)) λ_.result)) λ_.((recurse ((/ n) 10)) ((cons ((+ zero-byte) ((% n) 10))) result)))) n) nil)) (((num 0) 4) 8)) λ_.(print-byte (((num 0) 1) 0))) (Y λrecurse.λl.(((if (nil? l)) λ_.void) λ_.((do2 (print-byte (car l))) (recurse (cdr l)))))) PRINT_BYTE) λn.λf.((((Y λrecurse.λremaining.λcurrent.λf.(((if (zero? remaining)) λ_.void) λ_.((do2 (f current)) (((recurse (pred remaining)) (succ current)) f)))) n) 0) f)) λa.do3) λa.do2) λa.λb.b) λl.(pair-second (pair-second l))) λl.(pair-first (pair-second l))) λe.λl.((make-pair true) ((make-pair e) l))) λl.(not (pair-first l))) ((make-pair false) void)) λm.λn.((- m) ((* ((/ m) n)) n))) (Y λ/.λm.λn.(((if ((eq? m) n)) λ_.1) λ_.(((if (zero? ((- m) n))) λ_.0) λ_.((+ 1) ((/ ((- m) n)) n)))))) λm.λn.((and (zero? ((- m) n))) (zero? ((- n) m)))) λm.λn.((n pred) m)) λn.(((λn.λf.λx.(pair-second ((n λp.((make-pair (f (pair-first p))) (pair-first p))) ((make-pair x) x))) n) succ) 0)) λn.((n λ_.false) true)) λp.(p false)) λp.(p true)) λx.λy.λt.((t x) y)) λa.λb.((a true) b)) λa.λb.((a b) false)) λp.λt.λf.((p f) t)) λp.λa.λb.(((p a) b) void)) λt.λf.f) λt.λf.t) λa.λb.λc.((+ ((+ ((* ((* 10) 10)) a)) ((* 10) b))) c)) (succ 9)) (succ 8)) (succ 7)) (succ 6)) (succ 5)) (succ 4)) (succ 3)) (succ 2)) (succ 1)) (succ 0)) λm.λn.λx.(m (n x))) λm.λn.λf.λx.((((m succ) n) f) x)) λn.λf.λx.(f ((n f) x))) λf.λx.x) λx.(U U)) (U λh.λf.(f λx.(((h h) f) x)))) λf.(f f))
</code></pre>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBZmFsc2UlMkMlMjJvdXRwdXQlMjIlM0ElMjJvdXRwdXQlMjIlMkMlMjJjb2RlJTIyJTNBJTIyKCVDRSVCQlUuKCVDRSVCQlkuKCVDRSVCQnZvaWQuKCVDRSVCQjAuKCVDRSVCQnN1Y2MuKCVDRSVCQiUyQi4oJUNFJUJCKi4oJUNFJUJCMS4oJUNFJUJCMi4oJUNFJUJCMy4oJUNFJUJCNC4oJUNFJUJCNS4oJUNFJUJCNi4oJUNFJUJCNy4oJUNFJUJCOC4oJUNFJUJCOS4oJUNFJUJCMTAuKCVDRSVCQm51bS4oJUNFJUJCdHJ1ZS4oJUNFJUJCZmFsc2UuKCVDRSVCQmlmLiglQ0UlQkJub3QuKCVDRSVCQmFuZC4oJUNFJUJCb3IuKCVDRSVCQm1ha2UtcGFpci4oJUNFJUJCcGFpci1maXJzdC4oJUNFJUJCcGFpci1zZWNvbmQuKCVDRSVCQnplcm8lM0YuKCVDRSVCQnByZWQuKCVDRSVCQi0uKCVDRSVCQmVxJTNGLiglQ0UlQkIlMkYuKCVDRSVCQiUyNS4oJUNFJUJCbmlsLiglQ0UlQkJuaWwlM0YuKCVDRSVCQmNvbnMuKCVDRSVCQmNhci4oJUNFJUJCY2RyLiglQ0UlQkJkbzIuKCVDRSVCQmRvMy4oJUNFJUJCZG80LiglQ0UlQkJmb3IuKCVDRSVCQnByaW50LWJ5dGUuKCVDRSVCQnByaW50LWxpc3QuKCVDRSVCQnByaW50LW5ld2xpbmUuKCVDRSVCQnplcm8tYnl0ZS4oJUNFJUJCaXRvYS4oJUNFJUJCZml6em1zZy4oJUNFJUJCYnV6em1zZy4oJUNFJUJCZml6emJ1enptc2cuKCVDRSVCQmZpenpidXp6LihmaXp6YnV6eiUyMCgoKG51bSUyMDEpJTIwMCklMjAxKSklMjAlQ0UlQkJuLigoZm9yJTIwbiklMjAlQ0UlQkJpLigoZG8yJTIwKCgoaWYlMjAoemVybyUzRiUyMCgoJTI1JTIwaSklMjAzKSkpJTIwJUNFJUJCXy4oKChpZiUyMCh6ZXJvJTNGJTIwKCglMjUlMjBpKSUyMDUpKSklMjAlQ0UlQkJfLihwcmludC1saXN0JTIwZml6emJ1enptc2cpKSUyMCVDRSVCQl8uKHByaW50LWxpc3QlMjBmaXp6bXNnKSkpJTIwJUNFJUJCXy4oKChpZiUyMCh6ZXJvJTNGJTIwKCglMjUlMjBpKSUyMDUpKSklMjAlQ0UlQkJfLihwcmludC1saXN0JTIwYnV6em1zZykpJTIwJUNFJUJCXy4ocHJpbnQtbGlzdCUyMChpdG9hJTIwaSkpKSkpJTIwKHByaW50LW5ld2xpbmUlMjBuaWwpKSkpJTIwKChjb25zJTIwKCgobnVtJTIwMCklMjA3KSUyMDApKSUyMCgoY29ucyUyMCgoKG51bSUyMDEpJTIwMCklMjA1KSklMjAoKGNvbnMlMjAoKChudW0lMjAxKSUyMDIpJTIwMikpJTIwKChjb25zJTIwKCgobnVtJTIwMSklMjAyKSUyMDIpKSUyMCgoY29ucyUyMCgoKG51bSUyMDApJTIwOSklMjA4KSklMjAoKGNvbnMlMjAoKChudW0lMjAxKSUyMDEpJTIwNykpJTIwKChjb25zJTIwKCgobnVtJTIwMSklMjAyKSUyMDIpKSUyMCgoY29ucyUyMCgoKG51bSUyMDEpJTIwMiklMjAyKSklMjBuaWwpKSkpKSkpKSklMjAoKGNvbnMlMjAoKChudW0lMjAwKSUyMDYpJTIwNikpJTIwKChjb25zJTIwKCgobnVtJTIwMSklMjAxKSUyMDcpKSUyMCgoY29ucyUyMCgoKG51bSUyMDEpJTIwMiklMjAyKSklMjAoKGNvbnMlMjAoKChudW0lMjAxKSUyMDIpJTIwMikpJTIwbmlsKSkpKSklMjAoKGNvbnMlMjAoKChudW0lMjAwKSUyMDcpJTIwMCkpJTIwKChjb25zJTIwKCgobnVtJTIwMSklMjAwKSUyMDUpKSUyMCgoY29ucyUyMCgoKG51bSUyMDEpJTIwMiklMjAyKSklMjAoKGNvbnMlMjAoKChudW0lMjAxKSUyMDIpJTIwMikpJTIwbmlsKSkpKSklMjAlQ0UlQkJuLigoKFklMjAlQ0UlQkJyZWN1cnNlLiVDRSVCQm4uJUNFJUJCcmVzdWx0LigoKGlmJTIwKHplcm8lM0YlMjBuKSklMjAlQ0UlQkJfLigoKGlmJTIwKG5pbCUzRiUyMHJlc3VsdCkpJTIwJUNFJUJCXy4oKGNvbnMlMjB6ZXJvLWJ5dGUpJTIwbmlsKSklMjAlQ0UlQkJfLnJlc3VsdCkpJTIwJUNFJUJCXy4oKHJlY3Vyc2UlMjAoKCUyRiUyMG4pJTIwMTApKSUyMCgoY29ucyUyMCgoJTJCJTIwemVyby1ieXRlKSUyMCgoJTI1JTIwbiklMjAxMCkpKSUyMHJlc3VsdCkpKSklMjBuKSUyMG5pbCkpJTIwKCgobnVtJTIwMCklMjA0KSUyMDgpKSUyMCVDRSVCQl8uKHByaW50LWJ5dGUlMjAoKChudW0lMjAwKSUyMDEpJTIwMCkpKSUyMChZJTIwJUNFJUJCcmVjdXJzZS4lQ0UlQkJsLigoKGlmJTIwKG5pbCUzRiUyMGwpKSUyMCVDRSVCQl8udm9pZCklMjAlQ0UlQkJfLigoZG8yJTIwKHByaW50LWJ5dGUlMjAoY2FyJTIwbCkpKSUyMChyZWN1cnNlJTIwKGNkciUyMGwpKSkpKSklMjBQUklOVF9CWVRFKSUyMCVDRSVCQm4uJUNFJUJCZi4oKCgoWSUyMCVDRSVCQnJlY3Vyc2UuJUNFJUJCcmVtYWluaW5nLiVDRSVCQmN1cnJlbnQuJUNFJUJCZi4oKChpZiUyMCh6ZXJvJTNGJTIwcmVtYWluaW5nKSklMjAlQ0UlQkJfLnZvaWQpJTIwJUNFJUJCXy4oKGRvMiUyMChmJTIwY3VycmVudCkpJTIwKCgocmVjdXJzZSUyMChwcmVkJTIwcmVtYWluaW5nKSklMjAoc3VjYyUyMGN1cnJlbnQpKSUyMGYpKSkpJTIwbiklMjAwKSUyMGYpKSUyMCVDRSVCQmEuZG8zKSUyMCVDRSVCQmEuZG8yKSUyMCVDRSVCQmEuJUNFJUJCYi5iKSUyMCVDRSVCQmwuKHBhaXItc2Vjb25kJTIwKHBhaXItc2Vjb25kJTIwbCkpKSUyMCVDRSVCQmwuKHBhaXItZmlyc3QlMjAocGFpci1zZWNvbmQlMjBsKSkpJTIwJUNFJUJCZS4lQ0UlQkJsLigobWFrZS1wYWlyJTIwdHJ1ZSklMjAoKG1ha2UtcGFpciUyMGUpJTIwbCkpKSUyMCVDRSVCQmwuKG5vdCUyMChwYWlyLWZpcnN0JTIwbCkpKSUyMCgobWFrZS1wYWlyJTIwZmFsc2UpJTIwdm9pZCkpJTIwJUNFJUJCbS4lQ0UlQkJuLigoLSUyMG0pJTIwKCgqJTIwKCglMkYlMjBtKSUyMG4pKSUyMG4pKSklMjAoWSUyMCVDRSVCQiUyRi4lQ0UlQkJtLiVDRSVCQm4uKCgoaWYlMjAoKGVxJTNGJTIwbSklMjBuKSklMjAlQ0UlQkJfLjEpJTIwJUNFJUJCXy4oKChpZiUyMCh6ZXJvJTNGJTIwKCgtJTIwbSklMjBuKSkpJTIwJUNFJUJCXy4wKSUyMCVDRSVCQl8uKCglMkIlMjAxKSUyMCgoJTJGJTIwKCgtJTIwbSklMjBuKSklMjBuKSkpKSkpJTIwJUNFJUJCbS4lQ0UlQkJuLigoYW5kJTIwKHplcm8lM0YlMjAoKC0lMjBtKSUyMG4pKSklMjAoemVybyUzRiUyMCgoLSUyMG4pJTIwbSkpKSklMjAlQ0UlQkJtLiVDRSVCQm4uKChuJTIwcHJlZCklMjBtKSklMjAlQ0UlQkJuLigoKCVDRSVCQm4uJUNFJUJCZi4lQ0UlQkJ4LihwYWlyLXNlY29uZCUyMCgobiUyMCVDRSVCQnAuKChtYWtlLXBhaXIlMjAoZiUyMChwYWlyLWZpcnN0JTIwcCkpKSUyMChwYWlyLWZpcnN0JTIwcCkpKSUyMCgobWFrZS1wYWlyJTIweCklMjB4KSkpJTIwbiklMjBzdWNjKSUyMDApKSUyMCVDRSVCQm4uKChuJTIwJUNFJUJCXy5mYWxzZSklMjB0cnVlKSklMjAlQ0UlQkJwLihwJTIwZmFsc2UpKSUyMCVDRSVCQnAuKHAlMjB0cnVlKSklMjAlQ0UlQkJ4LiVDRSVCQnkuJUNFJUJCdC4oKHQlMjB4KSUyMHkpKSUyMCVDRSVCQmEuJUNFJUJCYi4oKGElMjB0cnVlKSUyMGIpKSUyMCVDRSVCQmEuJUNFJUJCYi4oKGElMjBiKSUyMGZhbHNlKSklMjAlQ0UlQkJwLiVDRSVCQnQuJUNFJUJCZi4oKHAlMjBmKSUyMHQpKSUyMCVDRSVCQnAuJUNFJUJCYS4lQ0UlQkJiLigoKHAlMjBhKSUyMGIpJTIwdm9pZCkpJTIwJUNFJUJCdC4lQ0UlQkJmLmYpJTIwJUNFJUJCdC4lQ0UlQkJmLnQpJTIwJUNFJUJCYS4lQ0UlQkJiLiVDRSVCQmMuKCglMkIlMjAoKCUyQiUyMCgoKiUyMCgoKiUyMDEwKSUyMDEwKSklMjBhKSklMjAoKColMjAxMCklMjBiKSkpJTIwYykpJTIwKHN1Y2MlMjA5KSklMjAoc3VjYyUyMDgpKSUyMChzdWNjJTIwNykpJTIwKHN1Y2MlMjA2KSklMjAoc3VjYyUyMDUpKSUyMChzdWNjJTIwNCkpJTIwKHN1Y2MlMjAzKSklMjAoc3VjYyUyMDIpKSUyMChzdWNjJTIwMSkpJTIwKHN1Y2MlMjAwKSklMjAlQ0UlQkJtLiVDRSVCQm4uJUNFJUJCeC4obSUyMChuJTIweCkpKSUyMCVDRSVCQm0uJUNFJUJCbi4lQ0UlQkJmLiVDRSVCQnguKCgoKG0lMjBzdWNjKSUyMG4pJTIwZiklMjB4KSklMjAlQ0UlQkJuLiVDRSVCQmYuJUNFJUJCeC4oZiUyMCgobiUyMGYpJTIweCkpKSUyMCVDRSVCQmYuJUNFJUJCeC54KSUyMCVDRSVCQnguKFUlMjBVKSklMjAoVSUyMCVDRSVCQmguJUNFJUJCZi4oZiUyMCVDRSVCQnguKCgoaCUyMGgpJTIwZiklMjB4KSkpKSUyMCVDRSVCQmYuKGYlMjBmKSklNUNuJTIyJTdE">Try it out in your browser!</a></p>
<p><span style="font-size: 75%">(This program expects a function to
be defined called <code>PRINT_BYTE</code> which takes a Church-encoded numeral, turns it
into a byte, writes it to <code>stdout</code>, and then returns the same Church-encoded
numeral. Expecting a function that has side-effects might arguably disqualify
this from being pure, but it&rsquo;s definitely arguable.)</span></p>
<p>Don&rsquo;t be deceived! I said there were no native numbers or lists or control
structures in lambda calculus and I meant it. <code>0</code>, <code>7</code>, <code>if</code>, and <code>+</code> are
all <em>variables</em> that represent <em>functions</em> and have to be constructed before
they can be used in the code block above.</p>
<h2 id="what-what-s-happening-here:ce1f6433c81ef1ac55c7793a299a68cb">What? What&rsquo;s happening here?</h2>
<p>Okay let&rsquo;s start over and build up to fizz buzz. We&rsquo;re going to need a lot.
We&rsquo;re going to need to build up concepts of numbers, logic, and lists all from
scratch. Ask your interviewers if they&rsquo;re comfortable cause this might be a
while.</p>
<p>Here is a basic lambda calculus function:</p>
<pre><code>λx.x
</code></pre>
<p>This is the identity function and it is equivalent to the following Javascript:</p>
<pre><code>function(x) { return x; }
</code></pre>
<p>It takes an argument and returns it! We can call the identity function with
another value. Function calling in many languages looks like <code>f(x)</code>, but in
lambda calculus, it looks like <code>(f x)</code>.</p>
<pre><code>(λx.x y)
</code></pre>
<p>This will return <code>y</code>. Once again, here&rsquo;s equivalent Javascript:</p>
<pre><code>function(x) { return x; }(y)
</code></pre>
<p><span style="font-size: 75%">
Aside: If you&rsquo;re already familiar with lambda calculus, my formulation of
precedence is such that <code>(λx.x y)</code> is not the same as <code>λx.(x y)</code>. <code>(λx.x y)</code>
applies <code>y</code> to the identity function <code>λx.x</code>, and <code>λx.(x y)</code> is a function
that applies <code>y</code> to its argument <code>x</code>. Perhaps not what you&rsquo;re used to, but the
parser was way more straightforward, and programming with it this way seems a
bit more natural, believe it or not.
</span></p>
<p>Okay, great. We can call functions. What if we want to pass more than one
argument?</p>
<h2 id="currying:ce1f6433c81ef1ac55c7793a299a68cb">Currying</h2>
<p>Imagine the following Javascript function:</p>
<pre><code>let s1 = function(f, x) { return f(x); }
</code></pre>
<p>We want to call it with two arguments, another function and a value, and we
want the function to then be called on the value, and have its result returned.
Can we do this while using only one argument?</p>
<p><a href="https://en.wikipedia.org/wiki/Currying">Currying</a> is a technique for dealing
with this. Instead of taking two arguments, take the first argument and return
another function that takes the second argument. Here&rsquo;s the Javascript:</p>
<pre><code>let s2 = function(f) {
return function(x) {
return f(x);
}
};
</code></pre>
<p>Now, <code>s1(f, x)</code> is the same as <code>s2(f)(x)</code>. So the equivalent lambda calculus
for <code>s2</code> is then</p>
<pre><code>λf.λx.(f x)
</code></pre>
<p>Calling this function with <code>g</code> for <code>f</code> and <code>y</code> for <code>x</code> is like so:</p>
<pre><code>((s2 g) y)
</code></pre>
<p>or</p>
<pre><code>((λf.λx.(f x) g) y)
</code></pre>
<p>The equivalent Javascript here is:</p>
<pre><code>function(f) {
return function(x) {
f(x)
}
}(g)(y)
</code></pre>
<h2 id="numbers:ce1f6433c81ef1ac55c7793a299a68cb">Numbers</h2>
<p>Since everything is a function, we might feel a little stuck with what to do
about numbers. Luckily,
<a href="https://en.wikipedia.org/wiki/Alonzo_Church">Alonzo Church</a> already figured it
out for us! When you have a number, often what you want to do is represent how
many times you might do something.</p>
<p>So let&rsquo;s represent a number as how many times we&rsquo;ll apply a function to a
value. This is called a
<a href="https://en.wikipedia.org/wiki/Church_encoding#Church_numerals">Church numeral</a>.
If we have <code>f</code> and <code>x</code>, <code>0</code> will mean we don&rsquo;t call <code>f</code> at all, and just return
<code>x</code>. <code>1</code> will mean we call <code>f</code> one time, <code>2</code> will mean we call <code>f</code> twice, and
so on.</p>
<p>Here are some definitions! (N.B.: assignment isn&rsquo;t actually part of lambda
calculus, but it makes writing down definitions easier)</p>
<pre><code>0 = λf.λx.x
</code></pre>
<p>Here, <code>0</code> takes a function <code>f</code>, a value <code>x</code>, and never calls <code>f</code>. It just
returns <code>x</code>. <code>f</code> is called 0 times.</p>
<pre><code>1 = λf.λx.(f x)
</code></pre>
<p>Like <code>0</code>, <code>1</code> takes <code>f</code> and <code>x</code>, but here it calls <code>f</code> exactly once. Let&rsquo;s see
how this continues for other numbers.</p>
<pre><code>2 = λf.λx.(f (f x))
3 = λf.λx.(f (f (f x)))
4 = λf.λx.(f (f (f (f x))))
5 = λf.λx.(f (f (f (f (f x)))))
</code></pre>
<p><code>5</code> is a function that takes <code>f</code>, <code>x</code>, and calls <code>f</code> 5 times!</p>
<p>Okay, this is convenient, but how are we going to do math on these numbers?</p>
<h2 id="successor:ce1f6433c81ef1ac55c7793a299a68cb">Successor</h2>
<p>Let&rsquo;s make a <em>successor</em> function that takes a number and returns a new number
that calls <code>f</code> just one more time.</p>
<pre><code>succ = λn. λf.λx.(f ((n f) x))
</code></pre>
<p><code>succ</code> is a function that takes a Church-encoded number, <code>n</code>. The spaces after
<code>λn.</code> are ignored. I put them there to indicate that we expect to usually call
<code>succ</code> with one argument, curried or no. <code>succ</code> then returns another
Church-encoded number, <code>λf.λx.(f ((n f) x))</code>. What is it doing? Let&rsquo;s break it
down.</p>
<ul>
<li><code>((n f) x)</code> looks like that time we needed to call a function that took
two &ldquo;curried&rdquo; arguments. So we&rsquo;re calling <code>n</code>, which is a Church numeral,
with two arguments, <code>f</code> and <code>x</code>. This is going to call <code>f</code> <code>n</code> times!</li>
<li><code>(f ((n f) x))</code> This is calling <code>f</code> again, one more time, on the result of
the previous value.</li>
</ul>
<p>So does <code>succ</code> work? Let&rsquo;s see what happens when we call <code>(succ 1)</code>. We should
get the <code>2</code> we defined earlier!</p>
<pre><code> (succ 1)
-&gt; (succ λf.λx.(f x)) # resolve the variable 1
-&gt; (λn.λf.λx.(f ((n f) x)) λf.λx.(f x)) # resolve the variable succ
-&gt; λf.λx.(f ((λf.λx.(f x) f) x)) # call the outside function. replace n
# with the argument
let's sidebar and simplify the subexpression
(λf.λx.(f x) f)
-&gt; λx.(f x) # call the function, replace f with f!
now we should be able to simplify the larger subexpression
((λf.λx.(f x) f) x)
-&gt; (λx.(f x) x) # sidebar above
-&gt; (f x) # call the function, replace x with x!
let's go back to the original now
λf.λx.(f ((λf.λx.(f x) f) x))
-&gt; λf.λx.(f (f x)) # subexpression simplification above
</code></pre>
<p>and done! That last line is identical to the <code>2</code> we defined originally! It
calls <code>f</code> twice.</p>
<h2 id="math:ce1f6433c81ef1ac55c7793a299a68cb">Math</h2>
<p>Now that we have the successor function, if your interviewers haven&rsquo;t checked
out, tell them that fizz buzz isn&rsquo;t too far away now; we have
<a href="https://en.wikipedia.org/wiki/Peano_axioms#Arithmetic">Peano Arithmetic</a>!
They can then check their interview bingo cards and see if they&rsquo;ve increased
their winnings.</p>
<p>No but for real, since we have the successor function, we can now easily do
addition and multiplication, which we will need for fizz buzz.</p>
<p>First, recall that a number <code>n</code> is a function that takes another function <code>f</code>
and an initial value <code>x</code> and applies <code>f</code> <em>n</em> times. So if you have two numbers
<em>m</em> and <em>n</em>, what you want to do is apply <code>succ</code> to <code>m</code> <em>n</em> times!</p>
<pre><code>+ = λm.λn.((n succ) m)
</code></pre>
<p>Here, <code>+</code> is a variable. If it&rsquo;s not a lambda expression or a function call,
it&rsquo;s a variable!</p>
<p>Multiplication is similar, but instead of applying <code>succ</code> to <code>m</code> <em>n</em> times,
we&rsquo;re going to add <code>m</code> to <code>0</code> <code>n</code> times.</p>
<p>First, note that if <code>((+ m) n)</code> is adding <code>m</code> and <code>n</code>, then that means that
<code>(+ m)</code> is a <em>function</em> that adds <code>m</code> to its argument. So we want to apply
the function <code>(+ m)</code> to <code>0</code> <code>n</code> times.</p>
<pre><code>* = λm.λn.((n (+ m)) 0)
</code></pre>
<p>Yay! We have multiplication and addition now.</p>
<h2 id="logic:ce1f6433c81ef1ac55c7793a299a68cb">Logic</h2>
<p>We&rsquo;re going to need booleans and if statements and logic tests and so on. So,
let&rsquo;s talk about booleans. Recall how with numbers, what we kind of wanted with
a number <code>n</code> is to do something <em>n</em> times. Similarly, what we want with
booleans is to do one of two things, either/or, but not both. Alonzo Church to
the rescue again.</p>
<p>Let&rsquo;s have booleans be functions that take two arguments (curried of course),
where the <code>true</code> boolean will return the first option, and the <code>false</code> boolean
will return the second.</p>
<pre><code>true = λt.λf.t
false = λt.λf.f
</code></pre>
<p>So that we can demonstrate booleans, we&rsquo;re going to define a simple sample
function called <code>zero?</code> that returns <code>true</code> if a number <code>n</code> is zero, and
<code>false</code> otherwise:</p>
<pre><code>zero? = λn.((n λ_.false) true)
</code></pre>
<p>To explain: if we have a Church numeral for 0, it will call the first argument
it gets called with 0 times and just return the second argument. In other
words, 0 will just return the second argument and that&rsquo;s it.
Otherwise, any other number will call the first argument at least once. So,
<code>zero?</code> will take <code>n</code> and give it a function that throws away its argument and
always returns <code>false</code> whenever it&rsquo;s called, and start it off with <code>true</code>.
Only zero values will return <code>true</code>.</p>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBZmFsc2UlMkMlMjJvdXRwdXQlMjIlM0ElMjJyZXN1bHQlMjIlMkMlMjJjb2RlJTIyJTNBJTIyMCUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC54JTVDbjElMjAlM0QlMjAlQ0UlQkJmLiVDRSVCQnguKGYlMjB4KSU1Q24yJTIwJTNEJTIwJUNFJUJCZi4lQ0UlQkJ4LihmJTIwKGYlMjB4KSklNUNuc3VjYyUyMCUzRCUyMCVDRSVCQm4uJUNFJUJCZi4lQ0UlQkJ4LihmJTIwKChuJTIwZiklMjB4KSklNUNuJTVDbnRydWUlMjAlMjAlM0QlMjAlQ0UlQkJ0LiVDRSVCQmYudCU1Q25mYWxzZSUyMCUzRCUyMCVDRSVCQnQuJUNFJUJCZi5mJTVDbiU1Q256ZXJvJTNGJTIwJTNEJTIwJUNFJUJCbi4oKG4lMjAlQ0UlQkJfLmZhbHNlKSUyMHRydWUpJTVDbiU1Q24lMjMlMjB0cnklMjBjaGFuZ2luZyUyMHRoZSUyMG51bWJlciUyMHplcm8lM0YlMjBpcyUyMGNhbGxlZCUyMHdpdGglNUNuKHplcm8lM0YlMjAwKSU1Q24lNUNuJTIzJTIwdGhlJTIwb3V0cHV0JTIwd2lsbCUyMGJlJTIwJTVDJTIyJUNFJUJCdC4lQ0UlQkJmLnQlNUMlMjIlMjBmb3IlMjB0cnVlJTIwYW5kJTIwJTVDJTIyJUNFJUJCdC4lQ0UlQkJmLmYlNUMlMjIlMjBmb3IlMjBmYWxzZS4lMjIlN0Q=">Try it out in your browser!</a></p>
<p>We can now write an <code>if'</code> function to make use of these boolean values. <code>if'</code>
will take a predicate value <code>p</code> (the boolean) and two options <code>a</code> and <code>b</code>.</p>
<pre><code>if' = λp.λa.λb.((p a) b)
</code></pre>
<p>You can use it like this:</p>
<pre><code>((if' (zero? n)
(something-when-zero x))
(something-when-not-zero y))
</code></pre>
<p>One thing that&rsquo;s weird about this construction is that the interpreter is going
to evaluate both branches (my lambda calculus interpreter is
<a href="https://en.wikipedia.org/wiki/Eager_evaluation">eager</a> instead of
<a href="https://en.wikipedia.org/wiki/Lazy_evaluation">lazy</a>). Both
<code>something-when-zero</code> and <code>something-when-not-zero</code> are going to be called to
determine what to pass in to <code>if'</code>. To make it so that we don&rsquo;t actually call
the function in the branch we don&rsquo;t want to run, let&rsquo;s protect the logic in
another function. We&rsquo;ll name the argument to the function <code>_</code> to indicate that
we want to just throw it away.</p>
<pre><code>((if (zero? n)
λ_. (something-when-zero x))
λ_. (something-when-not-zero y))
</code></pre>
<p>This means we&rsquo;re going to have to make a new <code>if</code> function that calls the
correct branch with a throwaway argument, like <code>0</code> or something.</p>
<pre><code>if = λp.λa.λb.(((p a) b) 0)
</code></pre>
<p>Okay, now we have booleans and <code>if</code>!</p>
<h2 id="currying-part-deux:ce1f6433c81ef1ac55c7793a299a68cb">Currying part deux</h2>
<p>At this point, you might be getting sick of how calling something with multiple
curried arguments involves all these extra parentheses. <code>((f a) b)</code> is
annoying, can&rsquo;t we just do <code>(f a b)</code>?</p>
<p>It&rsquo;s not part of the strict grammar, but my interpreter makes this small
concession. <code>(a b c)</code> will be expanded to <code>((a b) c)</code> by the parser.
<code>(a b c d)</code> will be expanded to <code>(((a b) c) d)</code> by the parser, and so on.</p>
<p>So, for the rest of the post, for ease of explanation, I&rsquo;m going to use this
<a href="https://en.wikipedia.org/wiki/Syntactic_sugar">syntax sugar</a>. Observe how
using <code>if</code> changes:</p>
<pre><code>(if (zero? n)
λ_. (something-when-zero x)
λ_. (something-when-not-zero y))
</code></pre>
<p>It&rsquo;s a little better.</p>
<h2 id="more-logic:ce1f6433c81ef1ac55c7793a299a68cb">More logic</h2>
<p>Let&rsquo;s talk about <code>and</code>, <code>or</code>, and <code>not</code>!</p>
<p><code>and</code> returns true if and only if both <code>a</code> and <code>b</code> are true. Let&rsquo;s define it!</p>
<pre><code>and = λa.λb.
(if (a)
λ_. b
λ_. false)
</code></pre>
<p><code>or</code> returns true if <code>a</code> is true or if <code>b</code> is true:</p>
<pre><code>or = λa.λb.
(if (a)
λ_. true
λ_. b)
</code></pre>
<p><code>not</code> just returns the opposite of whatever it was given:</p>
<pre><code>not = λa.
(if (a)
λ_. false
λ_. true)
</code></pre>
<p>It turns out these can be written a bit more simply, but they&rsquo;re basically
doing the same thing:</p>
<pre><code>and = λa.λb.(a b false)
or = λa.λb.(a true b)
not = λp.λt.λf.(p f t)
</code></pre>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBZmFsc2UlMkMlMjJvdXRwdXQlMjIlM0ElMjJyZXN1bHQlMjIlMkMlMjJjb2RlJTIyJTNBJTIyMCUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC54JTVDbjElMjAlM0QlMjAlQ0UlQkJmLiVDRSVCQnguKGYlMjB4KSU1Q24yJTIwJTNEJTIwJUNFJUJCZi4lQ0UlQkJ4LihmJTIwKGYlMjB4KSklNUNuMyUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC4oZiUyMChmJTIwKGYlMjB4KSkpJTVDbnN1Y2MlMjAlM0QlMjAlQ0UlQkJuLiVDRSVCQmYuJUNFJUJCeC4oZiUyMCgobiUyMGYpJTIweCkpJTVDbiU1Q250cnVlJTIwJTIwJTNEJTIwJUNFJUJCdC4lQ0UlQkJmLnQlNUNuZmFsc2UlMjAlM0QlMjAlQ0UlQkJ0LiVDRSVCQmYuZiU1Q24lNUNuemVybyUzRiUyMCUzRCUyMCVDRSVCQm4uKChuJTIwJUNFJUJCXy5mYWxzZSklMjB0cnVlKSU1Q24lNUNuaWYlMjAlM0QlMjAlQ0UlQkJwLiVDRSVCQmEuJUNFJUJCYi4oKChwJTIwYSklMjBiKSUyMDApJTVDbmFuZCUyMCUzRCUyMCVDRSVCQmEuJUNFJUJCYi4oYSUyMGIlMjBmYWxzZSklNUNub3IlMjAlM0QlMjAlQ0UlQkJhLiVDRSVCQmIuKGElMjB0cnVlJTIwYiklNUNubm90JTIwJTNEJTIwJUNFJUJCcC4lQ0UlQkJ0LiVDRSVCQmYuKHAlMjBmJTIwdCklNUNuJTVDbiUyMyUyMHRyeSUyMGNoYW5naW5nJTIwdGhpcyUyMHVwISU1Q24oaWYlMjAob3IlMjAoemVybyUzRiUyMDEpJTIwKHplcm8lM0YlMjAwKSklNUNuJTIwJTIwJTIwJTIwJUNFJUJCXy4lMjAyJTVDbiUyMCUyMCUyMCUyMCVDRSVCQl8uJTIwMyklMjIlN0Q=">Try it out in your browser!</a></p>
<h2 id="pairs:ce1f6433c81ef1ac55c7793a299a68cb">Pairs!</h2>
<p>Sometimes it&rsquo;s nice to keep data together. Let&rsquo;s make a little 2-tuple type!
We want three functions. We want a function called <code>make-pair</code> that will take
two arguments and return a &ldquo;pair&rdquo;, we want a function called <code>pair-first</code> that
will return the first element of the pair, and we want a function called
<code>pair-second</code> that will return the second element. How can we achieve this?
You&rsquo;re almost certainly in the interview room alone, but now&rsquo;s the time to yell
&ldquo;Alonzo Church&rdquo;!</p>
<pre><code>make-pair = λx.λy. λa.(a x y)
</code></pre>
<p><code>make-pair</code> is going to take two arguments, <code>x</code> and <code>y</code>, and they will be the
elements of the pair. The pair itself is a function that takes an &ldquo;accessor&rdquo;
<code>a</code> that will be given <code>x</code> and <code>y</code>. All <code>a</code> has to do is take the two arguments
and return the one it wants.</p>
<p>Here is someone making a pair with variables <code>1</code> and <code>2</code>:</p>
<pre><code>(make-pair 1 2)
</code></pre>
<p>This returns:</p>
<pre><code>λa.(a 1 2)
</code></pre>
<p>There&rsquo;s a pair! Now we just need to access the values inside.</p>
<p>Remember how <code>true</code> takes two arguments and returns the first one and <code>false</code>
takes two arguments and returns the second one?</p>
<pre><code>pair-first = λp.(p true)
pair-second = λp.(p false)
</code></pre>
<p><code>pair-first</code> is going to take a pair <code>p</code> and give it <code>true</code> as the accessor
<code>a</code>. <code>pair-second</code> is going to give the pair <code>false</code> as the accessor.</p>
<p>Voilà, you can now store 2-tuples of values and recover the data from them.</p>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBZmFsc2UlMkMlMjJvdXRwdXQlMjIlM0ElMjJyZXN1bHQlMjIlMkMlMjJjb2RlJTIyJTNBJTIyMCUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC54JTVDbjElMjAlM0QlMjAlQ0UlQkJmLiVDRSVCQnguKGYlMjB4KSU1Q24yJTIwJTNEJTIwJUNFJUJCZi4lQ0UlQkJ4LihmJTIwKGYlMjB4KSklNUNuMyUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC4oZiUyMChmJTIwKGYlMjB4KSkpJTVDbiU1Q250cnVlJTIwJTIwJTNEJTIwJUNFJUJCdC4lQ0UlQkJmLnQlNUNuZmFsc2UlMjAlM0QlMjAlQ0UlQkJ0LiVDRSVCQmYuZiU1Q24lNUNubWFrZS1wYWlyJTIwJTNEJTIwJUNFJUJCeC4lQ0UlQkJ5LiUyMCVDRSVCQmEuKGElMjB4JTIweSklNUNucGFpci1maXJzdCUyMCUzRCUyMCVDRSVCQnAuKHAlMjB0cnVlKSU1Q25wYWlyLXNlY29uZCUyMCUzRCUyMCVDRSVCQnAuKHAlMjBmYWxzZSklNUNuJTVDbiUyMyUyMHRyeSUyMGNoYW5naW5nJTIwdGhpcyUyMHVwISU1Q25wJTIwJTNEJTIwKG1ha2UtcGFpciUyMDIlMjAzKSU1Q24ocGFpci1zZWNvbmQlMjBwKSUyMiU3RA==">Try it out in your browser!</a></p>
<h2 id="lists:ce1f6433c81ef1ac55c7793a299a68cb">Lists!</h2>
<p>We&rsquo;re going to construct
<a href="https://en.wikipedia.org/wiki/Linked_list">linked lists</a>. Each list item needs
two things: the value at the current position in the list and a reference
to the rest of the list.</p>
<p>One additional caveat is we want to be able to identify an empty list, so we&rsquo;re
going to store whether or not the current value is the end of a list as well.
In <a href="https://en.wikipedia.org/wiki/Lisp_%28programming_language%29">LISP</a>-based
programming languages, the end of the list is the special value <code>nil</code>, and
checking if we&rsquo;ve hit the end of the list is accomplished with the <code>nil?</code>
predicate.</p>
<p>Because we want to distinguish <code>nil</code> from a list with a value, we&rsquo;re going to
store three things in each linked list item. Whether or not the list is empty,
and if not, the value and the rest of the list. So we need a 3-tuple.</p>
<p>Once we have pairs, other-sized tuples are easy. For instance, a 3-tuple is
just one pair with another pair inside for one of the slots.</p>
<p>For each list element, we&rsquo;ll store:</p>
<pre><code>[not-empty [value rest-of-list]]
</code></pre>
<p>As an example, a list element with a value of <code>1</code> would look like:</p>
<pre><code>[true [1 remainder]]
</code></pre>
<p>whereas <code>nil</code> will look like</p>
<pre><code>[false whatever]
</code></pre>
<p>That second part of <code>nil</code> just doesn&rsquo;t matter.</p>
<p>First, let&rsquo;s define <code>nil</code> and <code>nil?</code>:</p>
<pre><code>nil = (make-pair false false)
nil? = λl. (not (pair-first l))
</code></pre>
<p>The important thing about <code>nil</code> is that the first element in the pair is
<code>false</code>.</p>
<p>Now that we have an empty list, let&rsquo;s define how to add something to the front
of it. In LISP-based languages, the operation to <em>construct</em> a new list element
is called <code>cons</code>, so we&rsquo;ll call this <code>cons</code>, too.</p>
<p><code>cons</code> will take a value and an existing list and return a new list with
the given value at the front of the list.</p>
<pre><code>cons = λvalue.λlist.
(make-pair true (make-pair value list))
</code></pre>
<p><code>cons</code> is returning a pair where, unlike <code>nil</code>, the first element of the pair
is <code>true</code>. This represents that there&rsquo;s something in the list here. The second
pair element is what we wanted in our linked list: the value at the current
position, and a reference to the rest of the list.</p>
<p>So how do we access things in the list? Let&rsquo;s define two functions called
<code>head</code> and <code>tail</code>. <code>head</code> is going to return the value at the front of the
list, and <code>tail</code> is going to return everything but the front of the list. In
LISP-based languages, these functions are sometimes called <code>car</code> and <code>cdr</code> for
surprisingly <a href="https://en.wikipedia.org/wiki/CAR_and_CDR#Etymology">esoteric reasons</a>.
<code>head</code> and <code>tail</code> have undefined behavior here when called on <code>nil</code>, so let&rsquo;s
just assume <code>nil?</code> is false for the list and keep going.</p>
<pre><code>head = λlist. (pair-first (pair-second list))
tail = λlist. (pair-second (pair-second list))
</code></pre>
<p>Both <code>head</code> and <code>tail</code> first get <code>(pair-second list)</code>, which returns the tuple
that has the value and reference to the remainder. Then, they use either
<code>pair-first</code> or <code>pair-second</code> to get the current value or the rest of the list.</p>
<p>Great, we have lists!</p>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBZmFsc2UlMkMlMjJvdXRwdXQlMjIlM0ElMjJyZXN1bHQlMjIlMkMlMjJjb2RlJTIyJTNBJTIyMCUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC54JTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwMSUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC4oZiUyMHgpJTIwJTIwJTIwJTIwJTIwMiUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC4oZiUyMChmJTIweCkpJTIwJTIwJTIwJTIwMyUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC4oZiUyMChmJTIwKGYlMjB4KSkpJTVDbnRydWUlMjAlMjAlM0QlMjAlQ0UlQkJ0LiVDRSVCQmYudCUyMCUyMCUyMCUyMGZhbHNlJTIwJTNEJTIwJUNFJUJCdC4lQ0UlQkJmLmYlNUNuJTVDbm1ha2UtcGFpciUyMCUzRCUyMCVDRSVCQnguJUNFJUJCeS4lMjAlQ0UlQkJhLihhJTIweCUyMHkpJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwcGFpci1maXJzdCUyMCUzRCUyMCVDRSVCQnAuKHAlMjB0cnVlKSUyMCUyMCUyMCUyMCUyMHBhaXItc2Vjb25kJTIwJTNEJTIwJUNFJUJCcC4ocCUyMGZhbHNlKSU1Q24lNUNubmlsJTIwJTNEJTIwKG1ha2UtcGFpciUyMGZhbHNlJTIwZmFsc2UpJTIwJTIwJTIwJTIwJTIwbmlsJTNGJTIwJTNEJTIwJUNFJUJCbC4lMjAobm90JTIwKHBhaXItZmlyc3QlMjBsKSklNUNuY29ucyUyMCUzRCUyMCVDRSVCQnZhbHVlLiVDRSVCQmxpc3QuKG1ha2UtcGFpciUyMHRydWUlMjAobWFrZS1wYWlyJTIwdmFsdWUlMjBsaXN0KSklNUNuJTVDbmhlYWQlMjAlM0QlMjAlQ0UlQkJsaXN0LiUyMChwYWlyLWZpcnN0JTIwKHBhaXItc2Vjb25kJTIwbGlzdCkpJTVDbnRhaWwlMjAlM0QlMjAlQ0UlQkJsaXN0LiUyMChwYWlyLXNlY29uZCUyMChwYWlyLXNlY29uZCUyMGxpc3QpKSU1Q24lNUNuJTIzJTIwdHJ5JTIwY2hhbmdpbmclMjB0aGlzJTIwdXAhJTVDbmwlMjAlM0QlMjAoY29ucyUyMDElMjAoY29ucyUyMDIlMjAoY29ucyUyMDMlMjBuaWwpKSklNUNuKGhlYWQlMjAodGFpbCUyMGwpKSUyMiU3RA==">Try it out in your browser!</a></p>
<h2 id="recursion-and-loops:ce1f6433c81ef1ac55c7793a299a68cb">Recursion and loops</h2>
<p>Let&rsquo;s make a simple function that sums up a list of numbers.</p>
<pre><code>sum = λlist.
(if (nil? list)
λ_. 0
λ_. (+ (head list) (sum (tail list))))
</code></pre>
<p>If the list is empty, let&rsquo;s return 0.
If the list has an element, let&rsquo;s add that element to the sum of the rest of
the list. <a href="https://en.wikipedia.org/wiki/Recursion">Recursion</a> is a cornerstone
tool of computer science, and being able to assume a solution to a subproblem
to solve a problem is super neat!</p>
<p>Okay, except, this doesn&rsquo;t work like this in lambda calculus. Remember how I
said assignment wasn&rsquo;t something that exists in lambda calculus? If you have:</p>
<pre><code>x = y
&lt;stuff&gt;
</code></pre>
<p>This really means you have:</p>
<pre><code>(λx.&lt;stuff&gt; y)
</code></pre>
<p>In the case of our sum definition, we have:</p>
<pre><code>(λsum.
&lt;your-program&gt;
λlist.
(if (nil? list)
λ_. 0
λ_. (+ (head list) (sum (tail list)))))
</code></pre>
<p>What that means is <code>sum</code> doesn&rsquo;t have any access to itself. It can&rsquo;t call
itself like we&rsquo;ve written, because when it tries to call <code>sum</code>, it&rsquo;s undefined!</p>
<p>This is a pretty crushing blow, but it turns out there&rsquo;s a mind bending and
completely unexpected trick the universe has up its sleeve.</p>
<p>Assume we wrote <code>sum</code> so that it takes two arguments. A reference to something
like <code>sum</code> we&rsquo;ll call <code>helper</code> and then the list. If we could figure out how to
solve the recursion problem, then we could use this <code>sum</code>. Let&rsquo;s do that.</p>
<pre><code>sum = λhelper.λlist.
(if (nil? list)
λ_. 0
λ_. (+ (head list) (helper (tail list))))
</code></pre>
<p>But hey! When we call <code>sum</code>, we have a reference to <code>sum</code> then! Let&rsquo;s just
give <code>sum</code> itself before the list.</p>
<pre><code>(sum sum list)
</code></pre>
<p>This seems promising, but unfortunately now the <code>helper</code> invocation inside of
<code>sum</code> is broken. <code>helper</code> is just <code>sum</code> and <code>sum</code> expects a reference to
itself. Let&rsquo;s try again, changing the <code>helper</code> call:</p>
<pre><code>sum = λhelper.λlist.
(if (nil? list)
λ_. 0
λ_. (+ (head list) (helper helper (tail list))))
(sum sum list)
</code></pre>
<p>We did it! This actually works! We engineered recursion out of math!
At no point does <code>sum</code> refer to itself inside of itself, and yet we managed to
make a recursive function anyways!</p>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBdHJ1ZSUyQyUyMm91dHB1dCUyMiUzQSUyMm91dHB1dCUyMiUyQyUyMmNvZGUlMjIlM0ElMjJzdW0lMjAlM0QlMjAlQ0UlQkJoZWxwZXIuJUNFJUJCbGlzdC4lNUNuJTIwJTIwKGlmJTIwKG5pbCUzRiUyMGxpc3QpJTVDbiUyMCUyMCUyMCUyMCUyMCUyMCVDRSVCQl8uJTIwMCU1Q24lMjAlMjAlMjAlMjAlMjAlMjAlQ0UlQkJfLiUyMCglMkIlMjAoaGVhZCUyMGxpc3QpJTIwKGhlbHBlciUyMGhlbHBlciUyMCh0YWlsJTIwbGlzdCkpKSklNUNuJTVDbnJlc3VsdCUyMCUzRCUyMChzdW0lMjBzdW0lMjAoY29ucyUyMDElMjAoY29ucyUyMDIlMjAoY29ucyUyMDMlMjBuaWwpKSkpJTVDbiU1Q24lMjMlMjB3ZSdsbCUyMGV4cGxhaW4lMjBob3clMjBwcmludC1udW0lMjB3b3JrcyUyMGxhdGVyJTJDJTIwYnV0JTIwd2UlMjBuZWVkJTIwaXQlMjB0byUyMHNob3clMjB0aGF0JTIwc3VtJTIwaXMlMjB3b3JraW5nJTVDbihwcmludC1udW0lMjByZXN1bHQpJTIyJTdE">Try it out in your browser!</a></p>
<p>Despite the minor miracle we&rsquo;ve just performed, we&rsquo;ve now ruined how we program
recursion to involve calling recursive functions with themselves. This isn&rsquo;t
the end of the world, but it&rsquo;s a little annoying. Luckily for us, there&rsquo;s a
function that cleans this all right up called the
<a href="https://en.wikipedia.org/wiki/Fixed-point_combinator#Fixed_point_combinators_in_lambda_calculus">Y combinator</a>.</p>
<p>The <em>Y combinator</em> is probably now more famously known as
<a href="https://www.ycombinator.com/">a startup incubator</a>, or perhaps even more so as
the domain name for one of the most popular sites that has a different name
than its URL, <a href="https://news.ycombinator.com/">Hacker News</a>, but fixed point
combinators such as the Y combinator have had a longer history.</p>
<p>The Y combinator can be defined in different ways, but definition I&rsquo;m using is:</p>
<pre><code>Y = λf.(λx.(x x) λx.(f λy.((x x) y)))
</code></pre>
<p>You might consider reading more about how the Y combinator can be derived
from an excellent tutorial such as
<a href="http://matt.might.net/articles/implementation-of-recursive-fixed-point-y-combinator-in-javascript-for-memoization/">this one</a>
or
<a href="http://kestas.kuliukas.com/YCombinatorExplained/">this one</a>.</p>
<p>Anyway, <code>Y</code> will make our original <code>sum</code> work as expected.</p>
<pre><code>sum = (Y λhelper.λlist.
(if (nil? list)
λ_. 0
λ_. (+ (head list) (helper (tail list)))))
</code></pre>
<p>We can now call <code>(sum list)</code> without any wacky doubling of the function name,
either inside or outside of the function. Hooray!</p>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBdHJ1ZSUyQyUyMm91dHB1dCUyMiUzQSUyMm91dHB1dCUyMiUyQyUyMmNvZGUlMjIlM0ElMjJZJTIwJTNEJTIwJUNFJUJCZi4oJUNFJUJCeC4oeCUyMHgpJTIwJUNFJUJCeC4oZiUyMCVDRSVCQnkuKCh4JTIweCklMjB5KSkpJTVDbiU1Q25zdW0lMjAlM0QlMjAoWSUyMCVDRSVCQmhlbHBlci4lQ0UlQkJsaXN0LiU1Q24lMjAlMjAoaWYlMjAobmlsJTNGJTIwbGlzdCklNUNuJTIwJTIwJTIwJTIwJTIwJTIwJUNFJUJCXy4lMjAwJTVDbiUyMCUyMCUyMCUyMCUyMCUyMCVDRSVCQl8uJTIwKCUyQiUyMChoZWFkJTIwbGlzdCklMjAoaGVscGVyJTIwKHRhaWwlMjBsaXN0KSkpKSklNUNuJTVDbiUyMyUyMHdlJ2xsJTIwZXhwbGFpbiUyMGhvdyUyMHRoaXMlMjB3b3JrcyUyMGxhdGVyJTJDJTIwYnV0JTIwd2UlMjBuZWVkJTIwaXQlMjB0byUyMHNob3clMjB0aGF0JTIwc3VtJTIwaXMlMjB3b3JraW5nJTVDbnByaW50LW51bSUyMCUzRCUyMCVDRSVCQm4uKHByaW50LWxpc3QlMjAoaXRvYSUyMG4pKSU1Q24lNUNuKHByaW50LW51bSUyMChzdW0lMjAoY29ucyUyMDElMjAoY29ucyUyMDIlMjAoY29ucyUyMDMlMjBuaWwpKSkpKSUyMiU3RA">Try it out in your browser!</a></p>
<h2 id="more-math:ce1f6433c81ef1ac55c7793a299a68cb">More math</h2>
<p>&ldquo;Get ready to do more math! We now have enough building blocks to do
subtraction, division, and modulo, which we&rsquo;ll need for fizz buzz,&rdquo; you tell
the security guards that are approaching you.</p>
<p>Just like addition, before we define subtraction we&rsquo;ll define a predecessor
function. Unlike addition, the predecessor function <code>pred</code> is much more
complicated than the successor function <code>succ</code>.</p>
<p>The basic idea is we&rsquo;re going to create a pair to keep track of the previous
value. We&rsquo;ll start from zero and build up <code>n</code> but also drag the previous value
such that at <code>n</code> we also have <code>n - 1</code>. Notably, this solution does not figure
out how to deal with negative numbers. The predecessor of 0 will be 0, and
negatives will have to be dealt with some other time and some other way.</p>
<p>First, we&rsquo;ll make a helper function that takes a pair of numbers and returns a
new pair where the first number in the old pair is the second number in the new
pair, and the new first number is the successor of the old first number.</p>
<pre><code>pred-helper = λpair.
(make-pair (succ (pair-first pair)) (pair-first pair))
</code></pre>
<p>Make sense? If we call <code>pred-helper</code> on a pair <code>[0 0]</code>, the result will be
<code>[1 0]</code>. If we call it on <code>[1 0]</code>, the result will be <code>[2 1]</code>. Essentially
this helper slides older numbers off to the right.</p>
<p>Okay, so now we&rsquo;re going to call <code>pred-helper</code> <em>n</em> times, with a starting
pair of <code>[0 0]</code>, and then get the <em>second</em> value, which should be <code>n - 1</code> when
we&rsquo;re done, from the pair.</p>
<pre><code>pred = λn.
(pair-second (n pred-helper (make-pair 0 0)))
</code></pre>
<p>We can combine these two functions now for the full effect:</p>
<pre><code>pred = λn.
(pair-second
(n
λpair.(make-pair (succ (pair-first pair)) (pair-first pair))
(make-pair 0 0)))
</code></pre>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBdHJ1ZSUyQyUyMm91dHB1dCUyMiUzQSUyMm91dHB1dCUyMiUyQyUyMmNvZGUlMjIlM0ElMjIwJTIwJTNEJTIwJUNFJUJCZi4lQ0UlQkJ4LnglNUNuMSUyMCUzRCUyMCVDRSVCQmYuJUNFJUJCeC4oZiUyMHgpJTVDbjIlMjAlM0QlMjAlQ0UlQkJmLiVDRSVCQnguKGYlMjAoZiUyMHgpKSU1Q24zJTIwJTNEJTIwJUNFJUJCZi4lQ0UlQkJ4LihmJTIwKGYlMjAoZiUyMHgpKSklNUNuJTVDbnByZWQlMjAlM0QlMjAlQ0UlQkJuLiU1Q24lMjAlMjAocGFpci1zZWNvbmQlNUNuJTIwJTIwJTIwJTIwKG4lNUNuJTIwJTIwJTIwJTIwJTIwJUNFJUJCcGFpci4obWFrZS1wYWlyJTIwKHN1Y2MlMjAocGFpci1maXJzdCUyMHBhaXIpKSUyMChwYWlyLWZpcnN0JTIwcGFpcikpJTVDbiUyMCUyMCUyMCUyMCUyMChtYWtlLXBhaXIlMjAwJTIwMCkpKSU1Q24lNUNuJTIzJTIwd2UnbGwlMjBleHBsYWluJTIwaG93JTIwcHJpbnQtbnVtJTIwd29ya3MlMjBsYXRlciElNUNuKHByaW50LW51bSUyMChwcmVkJTIwMykpJTVDbiUyMiU3RA==">Try it out in your browser!</a></p>
<p>Now that we have <code>pred</code>, subtraction is easy! To subtract <code>n</code> from <code>m</code>, we&rsquo;re
going to apply <code>pred</code> to <code>m</code> <em>n</em> times.</p>
<pre><code>- = λm.λn.(n pred m)
</code></pre>
<p>Keep in mind that if <code>n</code> is equal to <em>or greater than</em> <code>m</code>, the result of
<code>(- m n)</code> will be zero, since there are no negative numbers and the predecessor
of <code>0</code> is <code>0</code>. This fact means we can implement some new logic tests.
Let&rsquo;s make <code>(ge? m n)</code> return <code>true</code> if <code>m</code> is greater than or equal to <code>n</code> and
make <code>(le? m n)</code> return <code>true</code> if <code>m</code> is less than or equal to <code>n</code>.</p>
<pre><code>ge? = λm.λn.(zero? (- n m))
le? = λm.λn.(zero? (- m n))
</code></pre>
<p>If we have greater-than-or-equal-to and less-than-or-equal-to, then we can
make equal!</p>
<pre><code>eq? = λm.λn.(and (ge? m n) (le? m n))
</code></pre>
<p>Now we have enough for integer division! The idea for integer division of <code>n</code>
and <code>m</code> is we will keep count of the times we can subtract <code>m</code> from <code>n</code> without
going past zero.</p>
<pre><code>/ = (Y λ/.λm.λn.
(if (eq? m n)
λ_. 1
λ_. (if (le? m n)
λ_. 0
λ_. (+ 1 (/ (- m n) n)))))
</code></pre>
<p>Once we have subtraction, multiplication, and integer division, we can create
modulo.</p>
<pre><code>% = λm.λn. (- m (* (/ m n) n))
</code></pre>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBdHJ1ZSUyQyUyMm91dHB1dCUyMiUzQSUyMm91dHB1dCUyMiUyQyUyMmNvZGUlMjIlM0ElMjIlMkIlMjAlM0QlMjAlQ0UlQkJtLiVDRSVCQm4uKG0lMjBzdWNjJTIwbiklNUNuKiUyMCUzRCUyMCVDRSVCQm0uJUNFJUJCbi4obiUyMCglMkIlMjBtKSUyMDApJTVDbi0lMjAlM0QlMjAlQ0UlQkJtLiVDRSVCQm4uKG4lMjBwcmVkJTIwbSklNUNuJTJGJTIwJTNEJTIwKFklMjAlQ0UlQkIlMkYuJUNFJUJCbS4lQ0UlQkJuLiU1Q24lMjAlMjAoaWYlMjAoZXElM0YlMjBtJTIwbiklNUNuJTIwJTIwJTIwJTIwJTIwJTIwJUNFJUJCXy4lMjAxJTVDbiUyMCUyMCUyMCUyMCUyMCUyMCVDRSVCQl8uJTIwKGlmJTIwKGxlJTNGJTIwbSUyMG4pJTVDbiUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCVDRSVCQl8uJTIwMCU1Q24lMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlQ0UlQkJfLiUyMCglMkIlMjAxJTIwKCUyRiUyMCgtJTIwbSUyMG4pJTIwbikpKSkpJTVDbiUyNSUyMCUzRCUyMCVDRSVCQm0uJUNFJUJCbi4lMjAoLSUyMG0lMjAoKiUyMCglMkYlMjBtJTIwbiklMjBuKSklNUNuJTVDbihwcmludC1udW0lMjAoJTI1JTIwNyUyMDMpKSUyMiU3RA==">Try it out in your browser!</a></p>
<h2 id="aside-about-performance:ce1f6433c81ef1ac55c7793a299a68cb">Aside about performance</h2>
<p>You might be wondering about performance at this point. Every time we subtract
one from 100, we count up from 0 to 100 to generate 99. This effect compounds
itself for division and modulo. The truth is that Church numerals and other
encodings aren&rsquo;t very performant! Just like how tapes in Turing machines aren&rsquo;t
a particularly efficient way to deal with data, Church encodings are most
interesting from a theoretical perspective for proving facts about computation.</p>
<p>That doesn&rsquo;t mean we can&rsquo;t make things faster though!</p>
<p>Lambda calculus is purely functional and side-effect free, which means that all
sorts of optimizations can applied. Functions can be aggressively memoized.
In other words, once a specific function and its arguments have been computed,
there&rsquo;s no need to compute them ever again. The result of that function will
always be the same anyways. Further, functions can be computed lazily and only
if needed. What this means is if a branch of your program&rsquo;s execution renders
a result that&rsquo;s never used, the compiler can decide to just not run that part
of the program and end up with the exact same result.</p>
<p><a href="https://github.com/jtolds/sheepda/">My interpreter</a> does have side effects,
since programs written in it can cause the system to write output to the user
via the special built-in function <code>PRINT_BYTE</code>. As a result, I didn&rsquo;t choose
lazy evaluation. The only optimization I chose was aggressive memoization for
all functions that are side-effect free. The memoization still has room for
improvement, but the result is much faster than a naive implementation.</p>
<h2 id="output:ce1f6433c81ef1ac55c7793a299a68cb">Output</h2>
<p>&ldquo;We&rsquo;re rounding the corner on fizz buzz!&rdquo; you shout at the receptionist as
security drags you around the corner on the way to the door. &ldquo;We just need to
figure out how to communicate results to the user!&rdquo;</p>
<p>Unfortunately, lambda calculus can&rsquo;t communicate with your operating system
kernel without some help, but a small concession is all we need.
<a href="https://github.com/jtolds/sheepda/">Sheepda</a> provides a single built-in
function <code>PRINT_BYTE</code>. <code>PRINT_BYTE</code> takes a number as its argument (a Church
encoded numeral) and prints the corresponding byte to the configured output
stream (usually <code>stdout</code>).</p>
<p>With <code>PRINT_BYTE</code>, we&rsquo;re going to need to reference a number of different
<a href="https://en.wikipedia.org/wiki/ASCII#Code_chart">ASCII bytes</a>, so we should
make writing numbers in code easier. Earlier we defined numbers 0 - 5, so let&rsquo;s
start and define numbers 6 - 10.</p>
<pre><code>6 = (succ 5)
7 = (succ 6)
8 = (succ 7)
9 = (succ 8)
10 = (succ 9)
</code></pre>
<p>Now let&rsquo;s define a helper to create three digit decimal numbers.</p>
<pre><code>num = λa.λb.λc.(+ (+ (* (* 10 10) a) (* 10 b)) c)
</code></pre>
<p>The newline byte is decimal 10. Here&rsquo;s a function to print newlines!</p>
<pre><code>print-newline = λ_.(PRINT_BYTE (num 0 1 0))
</code></pre>
<h2 id="doing-multiple-things:ce1f6433c81ef1ac55c7793a299a68cb">Doing multiple things</h2>
<p>Now that we have this <code>PRINT_BYTE</code> function, we have functions that can cause
side-effects. We want to call <code>PRINT_BYTE</code> but we don&rsquo;t care about its return
value. We need a way to call multiple functions in sequence.</p>
<p>What if we make a function that takes two arguments and throws away the first
one again?</p>
<pre><code>do2 = λ_.λx.x
</code></pre>
<p>Here&rsquo;s a function to print every value in a list:</p>
<pre><code>print-list = (Y λrecurse.λlist.
(if (nil? list)
λ_. 0
λ_. (do2 (PRINT_BYTE (head list))
(recurse (tail list)))))
</code></pre>
<p>And here&rsquo;s a function that works like a for loop. It calls <code>f</code> with every
number from <code>0</code> to <code>n</code>. It uses a small helper function that continues to call
itself until <code>i</code> is equal to <code>n</code>, and starts <code>i</code> off at <code>0</code>.</p>
<pre><code>for = λn.λf.(
(Y λrecurse.λi.
(if (eq? i n)
λ_. void
λ_. (do2 (f i)
(recurse (succ i)))))
0)
</code></pre>
<h2 id="converting-an-integer-to-a-string:ce1f6433c81ef1ac55c7793a299a68cb">Converting an integer to a string</h2>
<p>The last thing we need to complete fizz buzz is a function that turns a number
into a string of bytes to print. You might have noticed the <code>print-num</code> calls
in some of the web-based examples above. We&rsquo;re going to see how to make it!
Writing this function is sometimes a whiteboard problem in its own right. In C,
this function is called <code>itoa</code>, for integer to ASCII.</p>
<p>Here&rsquo;s an example of how it works. Imagine the number we&rsquo;re converting to bytes
is <code>123</code>. We can get the <code>3</code> out by doing <code>(% 123 10)</code>, which will be <code>3</code>. Then
we can divide by <code>10</code> to get <code>12</code>, and then start over. <code>(% 12 10)</code> is <code>2</code>.
We&rsquo;ll loop down until we hit zero.</p>
<p>Once we have a number, we can convert it to ASCII by adding the value of the
<code>'0'</code> ASCII byte. Then we can make a list of ASCII bytes for use with
<code>print-list</code>.</p>
<pre><code>zero-char = (num 0 4 8) # the ascii code for the byte that represents 0.
itoa = λn.(
(Y λrecurse.λn.λresult.
(if (zero? n)
λ_. (if (nil? result)
λ_. (cons zero-char nil)
λ_. result)
λ_. (recurse (/ n 10) (cons (+ zero-char (% n 10)) result))))
n nil)
print-num = λn.(print-list (itoa n))
</code></pre>
<h2 id="fizz-buzz:ce1f6433c81ef1ac55c7793a299a68cb">Fizz buzz</h2>
<p>&ldquo;Here we go,&rdquo; you shout at the building you just got kicked out of, &ldquo;here&rsquo;s how
you do fizz buzz.&rdquo;</p>
<p>First, we need to define three strings: &ldquo;Fizz&rdquo;, &ldquo;Buzz&rdquo;, and &ldquo;Fizzbuzz&rdquo;.</p>
<pre><code>fizzmsg = (cons (num 0 7 0) # F
(cons (num 1 0 5) # i
(cons (num 1 2 2) # z
(cons (num 1 2 2) # z
nil))))
buzzmsg = (cons (num 0 6 6) # B
(cons (num 1 1 7) # u
(cons (num 1 2 2) # z
(cons (num 1 2 2) # z
nil))))
fizzbuzzmsg = (cons (num 0 7 0) # F
(cons (num 1 0 5) # i
(cons (num 1 2 2) # z
(cons (num 1 2 2) # z
(cons (num 0 9 8) # b
(cons (num 1 1 7) # u
(cons (num 1 2 2) # z
(cons (num 1 2 2) # z
nil))))))))
</code></pre>
<p>Okay, now let&rsquo;s define a function that will run from 0 to <code>n</code> and output
numbers, fizzes, and buzzes:</p>
<pre><code>fizzbuzz = λn.
(for n λi.
(do2
(if (zero? (% i 3))
λ_. (if (zero? (% i 5))
λ_. (print-list fizzbuzzmsg)
λ_. (print-list fizzmsg))
λ_. (if (zero? (% i 5))
λ_. (print-list buzzmsg)
λ_. (print-list (itoa i))))
(print-newline 0)))
</code></pre>
<p>Let&rsquo;s do the first 20!</p>
<pre><code>(fizzbuzz (num 0 2 0))
</code></pre>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBdHJ1ZSUyQyUyMm91dHB1dCUyMiUzQSUyMm91dHB1dCUyMiUyQyUyMmNvZGUlMjIlM0ElMjIlMjMlMjBkZWZpbmUlMjB0aGUlMjBtZXNzYWdlcyU1Q25maXp6bXNnJTIwJTNEJTIwKGNvbnMlMjAobnVtJTIwMCUyMDclMjAwKSUyMChjb25zJTIwKG51bSUyMDElMjAwJTIwNSklMjAoY29ucyUyMChudW0lMjAxJTIwMiUyMDIpJTIwKGNvbnMlMjAobnVtJTIwMSUyMDIlMjAyKSUyMG5pbCkpKSklNUNuYnV6em1zZyUyMCUzRCUyMChjb25zJTIwKG51bSUyMDAlMjA2JTIwNiklMjAoY29ucyUyMChudW0lMjAxJTIwMSUyMDcpJTIwKGNvbnMlMjAobnVtJTIwMSUyMDIlMjAyKSUyMChjb25zJTIwKG51bSUyMDElMjAyJTIwMiklMjBuaWwpKSkpJTVDbmZpenpidXp6bXNnJTIwJTNEJTIwKGNvbnMlMjAobnVtJTIwMCUyMDclMjAwKSUyMChjb25zJTIwKG51bSUyMDElMjAwJTIwNSklMjAoY29ucyUyMChudW0lMjAxJTIwMiUyMDIpJTIwKGNvbnMlMjAobnVtJTIwMSUyMDIlMjAyKSU1Q24lMjAlMjAlMjAlMjAoY29ucyUyMChudW0lMjAwJTIwOSUyMDgpJTIwKGNvbnMlMjAobnVtJTIwMSUyMDElMjA3KSUyMChjb25zJTIwKG51bSUyMDElMjAyJTIwMiklMjAoY29ucyUyMChudW0lMjAxJTIwMiUyMDIpJTIwbmlsKSkpKSkpKSklNUNuJTVDbiUyMyUyMGZpenpidXp6JTVDbmZpenpidXp6JTIwJTNEJTIwJUNFJUJCbi4lNUNuJTIwJTIwKGZvciUyMG4lMjAlQ0UlQkJpLiU1Q24lMjAlMjAlMjAlMjAoZG8yJTVDbiUyMCUyMCUyMCUyMCUyMCUyMChpZiUyMCh6ZXJvJTNGJTIwKCUyNSUyMGklMjAzKSklNUNuJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJUNFJUJCXy4lMjAoaWYlMjAoemVybyUzRiUyMCglMjUlMjBpJTIwNSkpJTVDbiUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCVDRSVCQl8uJTIwKHByaW50LWxpc3QlMjBmaXp6YnV6em1zZyklNUNuJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJUNFJUJCXy4lMjAocHJpbnQtbGlzdCUyMGZpenptc2cpKSU1Q24lMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlMjAlQ0UlQkJfLiUyMChpZiUyMCh6ZXJvJTNGJTIwKCUyNSUyMGklMjA1KSklNUNuJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJUNFJUJCXy4lMjAocHJpbnQtbGlzdCUyMGJ1enptc2cpJTVDbiUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCVDRSVCQl8uJTIwKHByaW50LWxpc3QlMjAoaXRvYSUyMGkpKSkpJTVDbiUyMCUyMCUyMCUyMCUyMCUyMChwcmludC1uZXdsaW5lJTIwbmlsKSkpJTVDbiU1Q24lMjMlMjBydW4lMjBmaXp6YnV6eiUyMDIwJTIwdGltZXMlNUNuKGZpenpidXp6JTIwKG51bSUyMDAlMjAyJTIwMCkpJTIyJTdE">Try it out in your browser!</a></p>
<h2 id="reverse-a-string:ce1f6433c81ef1ac55c7793a299a68cb">Reverse a string</h2>
<p>&ldquo;ENCORE!&rdquo; you shout to no one as the last cars pull out of the company parking
lot. Everyone&rsquo;s gone home but this is your last night before the restraining
order goes through.</p>
<pre><code>reverse-list = λlist.(
(Y λrecurse.λold.λnew.
(if (nil? old)
λ_.new
λ_.(recurse (tail old) (cons (head old) new))))
list nil)
</code></pre>
<p>➡️️ <a href="https://jtolds.github.io/sheepda/#JTdCJTIyc3RkbGliJTIyJTNBdHJ1ZSUyQyUyMm91dHB1dCUyMiUzQSUyMm91dHB1dCUyMiUyQyUyMmNvZGUlMjIlM0ElMjJoZWxsby13b3JsZCUyMCUzRCUyMChjb25zJTIwKG51bSUyMDAlMjA3JTIwMiklMjAoY29ucyUyMChudW0lMjAxJTIwMCUyMDEpJTIwKGNvbnMlMjAobnVtJTIwMSUyMDAlMjA4KSUyMChjb25zJTIwKG51bSUyMDElMjAwJTIwOCklMjAoY29ucyUyMChudW0lMjAxJTIwMSUyMDEpJTVDbiUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMChjb25zJTIwKG51bSUyMDAlMjA0JTIwNCklMjAoY29ucyUyMChudW0lMjAwJTIwMyUyMDIpJTIwKGNvbnMlMjAobnVtJTIwMSUyMDElMjA5KSUyMChjb25zJTIwKG51bSUyMDElMjAxJTIwMSklMjAoY29ucyUyMChudW0lMjAxJTIwMSUyMDQpJTVDbiUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMCUyMChjb25zJTIwKG51bSUyMDElMjAwJTIwOCklMjAoY29ucyUyMChudW0lMjAxJTIwMCUyMDApJTIwKGNvbnMlMjAobnVtJTIwMCUyMDMlMjAzKSUyMG5pbCkpKSkpKSkpKSkpKSklNUNuJTVDbnJldmVyc2UtbGlzdCUyMCUzRCUyMCVDRSVCQmxpc3QuKCU1Q24lMjAlMjAoWSUyMCVDRSVCQnJlY3Vyc2UuJUNFJUJCb2xkLiVDRSVCQm5ldy4lNUNuJTIwJTIwJTIwJTIwKGlmJTIwKG5pbCUzRiUyMG9sZCklNUNuJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJUNFJUJCXy5uZXclNUNuJTIwJTIwJTIwJTIwJTIwJTIwJTIwJTIwJUNFJUJCXy4ocmVjdXJzZSUyMCh0YWlsJTIwb2xkKSUyMChjb25zJTIwKGhlYWQlMjBvbGQpJTIwbmV3KSkpKSU1Q24lMjAlMjBsaXN0JTIwbmlsKSU1Q24lNUNuKGRvNCU1Q24lMjAlMjAocHJpbnQtbGlzdCUyMGhlbGxvLXdvcmxkKSU1Q24lMjAlMjAocHJpbnQtbmV3bGluZSUyMHZvaWQpJTVDbiUyMCUyMChwcmludC1saXN0JTIwKHJldmVyc2UtbGlzdCUyMGhlbGxvLXdvcmxkKSklNUNuJTIwJTIwKHByaW50LW5ld2xpbmUlMjB2b2lkKSklMjIlN0Q=">Try it out in your browser!</a></p>
<h2 id="sheepda:ce1f6433c81ef1ac55c7793a299a68cb">Sheepda</h2>
<p>As I mentioned, I wrote a lambda calculus interpreter called
<a href="https://github.com/jtolds/sheepda/">Sheepda</a> for playing around. By itself
it&rsquo;s pretty interesting if you&rsquo;re interested in learning more about how to
write programming language interpreters. Lambda calculus is as simple of a
language as you can make, so the interpreter is very simple itself!</p>
<p>It&rsquo;s written in Go and thanks to <a href="https://github.com/gopherjs/gopherjs">GopherJS</a>
it&rsquo;s what powers the <a href="https://jtolds.github.io/sheepda/">web playground</a>.</p>
<p>There are some fun projects if someone&rsquo;s interested in getting more involved.
Using the library to prune lambda expression trees and simplify expressions
if possible would be a start! I&rsquo;m sure my fizz buzz implementation isn&rsquo;t as
minimal as it could be, and playing
<a href="https://en.wikipedia.org/wiki/Code_golf">code golf</a> with it would be pretty
neat!</p>
<p>Feel free to fork
<a href="https://github.com/jtolds/sheepda/">https://github.com/jtolds/sheepda/</a>,
star it, bop it, twist it, or even pull it!</p>
Magic GOPATHhttps://www.jtolio.com/2017/01/magic-gopath
Sat, 14 Jan 2017 18:47:00 -0700hello@jtolio.com (JT Olio)https://www.jtolio.com/2017/01/magic-gopath
<p><em><strong>Update:</strong> With the advent of Go 1.11 and <a href="https://golang.org/cmd/go/#hdr-Modules__module_versions__and_more">Go modules</a>, this whole post is now
useless. Unset your GOPATH entirely and switch to Go modules today!</em></p>
<p>Maybe someday I&rsquo;ll start writing about things besides Go again.</p>
<p>Go requires that you set an environment variable for your workspace called
your <code>GOPATH</code>. The <code>GOPATH</code> is one of the most confusing aspects of Go to
newcomers and even relatively seasoned developers alike. It&rsquo;s not immediately
clear what would be better, but finding a good <code>GOPATH</code> value has implications
for your source code repository layout, how many separate projects you have on
your computer, how default project installation instructions work
(via <code>go get</code>), and even how you interoperate with other projects and
libraries.</p>
<p>It&rsquo;s taken until Go 1.8 to decide to
<a href="https://rakyll.org/default-gopath/">set a default</a> and that small change
was one of
<a href="https://go-review.googlesource.com/32019/">the most talked about code reviews</a>
for the 1.8 release cycle.</p>
<p>After
<a href="https://dave.cheney.net/2016/12/20/thinking-about-gopath">writing about GOPATH himself</a>,
<a href="https://dave.cheney.net/">Dave Cheney</a>
<a href="https://twitter.com/davecheney/status/811334240247812097">asked me</a>
to write a blog post about what I do.</p>
<h2 id="my-proposal:ed846704857e7729362d04cdc690c1c4">My proposal</h2>
<p>I set my <code>GOPATH</code> to always be the current working directory, unless a parent
directory is clearly the <code>GOPATH</code>.</p>
<p>Here&rsquo;s the relevant part of my <code>.bashrc</code>:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #008000"># bash command to output calculated GOPATH.</span>
calc_gopath() {
local dir=<span style="color: #a31515">&quot;</span>$PWD<span style="color: #a31515">&quot;</span>
<span style="color: #008000"># we&#39;re going to walk up from the current directory to the root</span>
<span style="color: #0000ff">while</span> true; <span style="color: #0000ff">do</span>
<span style="color: #008000"># if there&#39;s a &#39;.gopath&#39; file, use its contents as the GOPATH relative to</span>
<span style="color: #008000"># the directory containing it.</span>
<span style="color: #0000ff">if</span> [ -f <span style="color: #a31515">&quot;</span>$dir<span style="color: #a31515">/.gopath&quot;</span> ]; <span style="color: #0000ff">then</span>
( cd <span style="color: #a31515">&quot;</span>$dir<span style="color: #a31515">&quot;</span>;
<span style="color: #008000"># allow us to squash this behavior for cases we want to use vgo</span>
<span style="color: #0000ff">if</span> [ <span style="color: #a31515">&quot;</span><span style="color: #0000ff">$(</span>cat .gopath<span style="color: #0000ff">)</span><span style="color: #a31515">&quot;</span> != <span style="color: #a31515">&quot;&quot;</span> ]; <span style="color: #0000ff">then</span>
cd <span style="color: #a31515">&quot;</span><span style="color: #0000ff">$(</span>cat .gopath<span style="color: #0000ff">)</span><span style="color: #a31515">&quot;</span>;
echo <span style="color: #a31515">&quot;</span>$PWD<span style="color: #a31515">&quot;</span>;
<span style="color: #0000ff">fi</span>; )
<span style="color: #0000ff">return</span>
<span style="color: #0000ff">fi</span>
<span style="color: #008000"># if there&#39;s a &#39;src&#39; directory, the parent of that directory is now the</span>
<span style="color: #008000"># GOPATH</span>
<span style="color: #0000ff">if</span> [ -d <span style="color: #a31515">&quot;</span>$dir<span style="color: #a31515">/src&quot;</span> ]; <span style="color: #0000ff">then</span>
echo <span style="color: #a31515">&quot;</span>$dir<span style="color: #a31515">&quot;</span>
<span style="color: #0000ff">return</span>
<span style="color: #0000ff">fi</span>
<span style="color: #008000"># we can&#39;t go further, so bail. we&#39;ll make the original PWD the GOPATH.</span>
<span style="color: #0000ff">if</span> [ <span style="color: #a31515">&quot;</span>$dir<span style="color: #a31515">&quot;</span> == <span style="color: #a31515">&quot;/&quot;</span> ]; <span style="color: #0000ff">then</span>
echo <span style="color: #a31515">&quot;</span>$PWD<span style="color: #a31515">&quot;</span>
<span style="color: #0000ff">return</span>
<span style="color: #0000ff">fi</span>
<span style="color: #008000"># now we&#39;ll consider the parent directory</span>
dir=<span style="color: #a31515">&quot;</span><span style="color: #0000ff">$(</span>dirname <span style="color: #a31515">&quot;</span>$dir<span style="color: #a31515">&quot;</span><span style="color: #0000ff">)</span><span style="color: #a31515">&quot;</span>
<span style="color: #0000ff">done</span>
}
my_prompt_command() {
export GOPATH=<span style="color: #a31515">&quot;</span><span style="color: #0000ff">$(</span>calc_gopath<span style="color: #0000ff">)</span><span style="color: #a31515">&quot;</span>
<span style="color: #008000"># you can have other neat things in here. I also set my PS1 based on git</span>
<span style="color: #008000"># state</span>
}
<span style="color: #0000ff">case</span> <span style="color: #a31515">&quot;</span>$TERM<span style="color: #a31515">&quot;</span> in
xterm*|rxvt*)
<span style="color: #008000"># Bash provides an environment variable called PROMPT_COMMAND. The contents</span>
<span style="color: #008000"># of this variable are executed as a regular Bash command just before Bash</span>
<span style="color: #008000"># displays a prompt. Let&#39;s only set it if we&#39;re in some kind of graphical</span>
<span style="color: #008000"># terminal I guess.</span>
PROMPT_COMMAND=my_prompt_command
;;
*)
;;
<span style="color: #0000ff">esac</span>
</pre></div>
</p>
<p>The benefits are fantastic. If you want to quickly <code>go get</code> something and not
have it clutter up your workspace, you can do something like:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span>cd <span style="color: #0000ff">$(</span>mktemp -d<span style="color: #0000ff">)</span> &amp;&amp; go get github.com/the/thing
</pre></div>
</p>
<p>On the other hand, if you&rsquo;re jumping between multiple projects (whether or not
they have the full workspace checked in or are just library packages), the
<code>GOPATH</code> is set accurately.</p>
<p>More flexibly, if you have a tree where some parent directory is outside of the
<code>GOPATH</code> but you want to set the <code>GOPATH</code> anyways, you can create a <code>.gopath</code>
file and it will automatically set your <code>GOPATH</code> correctly any time your shell
is inside that directory.</p>
<p>The whole thing is super nice. I kinda can&rsquo;t imagine doing something else
anymore.</p>
<h2 id="fin:ed846704857e7729362d04cdc690c1c4">Fin.</h2>
Writing Advanced Web Applications with Gohttps://www.jtolio.com/2017/01/writing-advanced-web-applications-with-go
Wed, 11 Jan 2017 19:44:00 -0700hello@jtolio.com (JT Olio)https://www.jtolio.com/2017/01/writing-advanced-web-applications-with-go
<p>Web development in many programming environments often requires subscribing to
some full framework ethos. With <a href="https://www.ruby-lang.org/">Ruby</a>, it&rsquo;s
usually <a href="http://rubyonrails.org/">Rails</a> but could be
<a href="http://www.sinatrarb.com/">Sinatra</a> or something else. With
<a href="https://www.python.org/">Python</a>, it&rsquo;s often
<a href="https://www.djangoproject.com/">Django</a> or <a href="http://flask.pocoo.org/">Flask</a>.
With <a href="https://golang.org/">Go</a>, it&rsquo;s&hellip;</p>
<p>If you spend some time in Go communities like the
<a href="https://groups.google.com/d/forum/golang-nuts">Go mailing list</a> or the
<a href="https://www.reddit.com/r/golang/">Go subreddit</a>, you&rsquo;ll find Go newcomers
frequently wondering what web framework is best to use.
<a href="https://revel.github.io/">There</a> <a href="https://gin-gonic.github.io/gin/">are</a>
<a href="http://iris-go.com/">quite</a> <a href="https://beego.me/">a</a>
<a href="https://go-macaron.com/">few</a> <a href="https://github.com/go-martini/martini">Go</a>
<a href="https://github.com/gocraft/web">frameworks</a>
(<a href="https://github.com/urfave/negroni">and</a> <a href="https://godoc.org/goji.io">then</a>
<a href="https://echo.labstack.com/">some</a>), so which
one is best seems like a reasonable question. Without fail, though, the strong
recommendation of the Go community is to
<a href="https://medium.com/code-zen/why-i-don-t-use-go-web-frameworks-1087e1facfa4">avoid web frameworks entirely</a>
and just stick with the standard library as long as possible. Here&rsquo;s
<a href="https://groups.google.com/forum/#!topic/golang-nuts/R_lqsTTBh6I">an example from the Go mailing list</a>
and here&rsquo;s <a href="https://www.reddit.com/r/golang/comments/1yh6gm/new_to_go_trying_to_select_web_framework/">one from the subreddit</a>.</p>
<p>It&rsquo;s not bad advice! The Go standard library is very rich and flexible, much
more so than many other languages, and designing a web application in Go with
just the standard library is definitely a good choice.</p>
<p>Even when these Go frameworks call themselves minimalistic, they can&rsquo;t seem to
help themselves avoid using a different request handler interface than the
default standard library
<a href="https://golang.org/pkg/net/http/#Handler">http.Handler</a>, and I think this is
the biggest source of angst about why frameworks should be avoided. If everyone
standardizes on <a href="https://golang.org/pkg/net/http/#Handler">http.Handler</a>, then
dang, all sorts of things would be interoperable!</p>
<p>Before Go 1.7, it made some sense to give in and use a different interface for
handling HTTP requests. But now that
<a href="https://golang.org/pkg/net/http/#Request">http.Request</a> has the
<a href="https://golang.org/pkg/net/http/#Request.Context">Context</a> and
<a href="https://golang.org/pkg/net/http/#Request.WithContext">WithContext</a> methods,
there truly isn&rsquo;t a good reason any longer.</p>
<p>I&rsquo;ve done a fair share of web development in Go and I&rsquo;m here to share with you
both some standard library development patterns I&rsquo;ve learned and some code I&rsquo;ve
found myself frequently needing. The code I&rsquo;m sharing is not for use instead of
the standard library, but to augment it.</p>
<p>Overall, if this blog post feels like it&rsquo;s predominantly plugging various
little standalone libraries from my
<a href="https://godoc.org/gopkg.in/webhelp.v1">Webhelp non-framework</a>, that&rsquo;s because
it is. It&rsquo;s okay, they&rsquo;re little standalone libraries. Only use the ones
you want!</p>
<p>If you&rsquo;re new to Go web development, I suggest reading the Go documentation&rsquo;s
<a href="https://golang.org/doc/articles/wiki/">Writing Web Applications</a> article
first.</p>
<h2 id="middleware:f9c06ffbaee33bff858b5f3a6f296479">Middleware</h2>
<p>A frequent design pattern for server-side web development is the concept of
<em>middleware</em>, where some portion of the request handler wraps some other
portion of the request handler and does some preprocessing or routing or
something. This is a big component of how <a href="http://expressjs.com/">Express</a> is
organized on <a href="https://nodejs.org/en/">Node</a>, and how Express middleware and
<a href="https://github.com/urfave/negroni">Negroni</a> middleware works is almost
line-for-line identical in design.</p>
<p>Good use cases for middleware are things such as:</p>
<ul>
<li>making sure a user is logged in, redirecting if not,</li>
<li>making sure the request came over HTTPS,</li>
<li>making sure a session is set up and loaded from a session database,</li>
<li>making sure we logged information before and after the request was handled,</li>
<li>making sure the request was routed to the right handler,</li>
<li>and so on.</li>
</ul>
<p>Composing your web app as essentially a chain of middleware handlers is a very
powerful and flexible approach. It allows you to avoid a lot of
<a href="https://en.wikipedia.org/wiki/Cross-cutting_concern">cross-cutting concerns</a>
and have your code factored in very elegant and easy-to-maintain ways. By
wrapping a set of handlers with middleware that ensures a user is logged in
prior to actually attempting to handle the request, the individual handlers no
longer need mistake-prone copy-and-pasted code to ensure the same thing.</p>
<p>So, middleware is good. However, if Negroni or other frameworks are any
indication, you&rsquo;d think the standard library&rsquo;s <code>http.Handler</code> isn&rsquo;t up to the
challenge. Negroni adds its own <code>negroni.Handler</code> just for the sake of making
middleware easier. There&rsquo;s no reason for this.</p>
<p>Here is a full middleware implementation for ensuring a user is logged in,
assuming a <code>GetUser(*http.Request)</code> function but otherwise just using the
standard library:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> RequireUser(h http.Handler) http.Handler {
<span style="color: #0000ff">return</span> http.HandlerFunc(<span style="color: #0000ff">func</span>(w http.ResponseWriter, req *http.Request) {
user, err := GetUser(req)
<span style="color: #0000ff">if</span> err != <span style="color: #0000ff">nil</span> {
http.Error(w, err.Error(), http.StatusInternalServerError)
<span style="color: #0000ff">return</span>
}
<span style="color: #0000ff">if</span> user == <span style="color: #0000ff">nil</span> {
http.Error(w, <span style="color: #a31515">&quot;unauthorized&quot;</span>, http.StatusUnauthorized)
<span style="color: #0000ff">return</span>
}
h.ServeHTTP(w, req)
})
}
</pre></div>
</p>
<p>Here&rsquo;s how it&rsquo;s used (just wrap another handler!):</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> main() {
http.ListenAndServe(<span style="color: #a31515">&quot;:8080&quot;</span>, RequireUser(http.HandlerFunc(myHandler)))
}
</pre></div>
</p>
<p>Express, Negroni, and other frameworks expect this kind of signature for a
middleware-supporting handler:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> Handler <span style="color: #0000ff">interface</span> {
<span style="color: #008000">// don&#39;t do this!</span>
ServeHTTP(rw http.ResponseWriter, req *http.Request, next http.HandlerFunc)
}
</pre></div>
</p>
<p>There&rsquo;s really no reason for adding the <code>next</code> argument - it reduces
cross-library compatibility. So I say, don&rsquo;t use <code>negroni.Handler</code>
(or similar). Just use <code>http.Handler</code>!</p>
<h2 id="composability:f9c06ffbaee33bff858b5f3a6f296479">Composability</h2>
<p>Hopefully I&rsquo;ve sold you on middleware as a good design philosophy.</p>
<p>Probably the most commonly-used type of middleware is request routing, or
muxing (seems like we should call this demuxing but what do I know). Some
frameworks are almost solely focused on request routing.
<a href="https://github.com/gorilla/mux">gorilla/mux</a> seems more popular than
any other part of the <a href="https://github.com/gorilla/">Gorilla</a> library. I
think the reason for this is that even though the Go standard library is
completely full featured and has a good
<a href="https://golang.org/pkg/net/http/#ServeMux">ServeMux</a> implementation, it
doesn&rsquo;t make the right thing the default.</p>
<p>So! Let&rsquo;s talk about request routing and consider the following problem. You,
web developer extraordinaire, want to serve some HTML from your web server at
<code>/hello/</code> but also want to serve some static assets from <code>/static/</code>. Let&rsquo;s
take a quick stab.</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">package</span> main
<span style="color: #0000ff">import</span> (
<span style="color: #a31515">&quot;net/http&quot;</span>
)
<span style="color: #0000ff">func</span> hello(w http.ResponseWriter, req *http.Request) {
w.Write([]byte(<span style="color: #a31515">&quot;hello, world!&quot;</span>))
}
<span style="color: #0000ff">func</span> main() {
mux := http.NewServeMux()
mux.Handle(<span style="color: #a31515">&quot;/hello/&quot;</span>, http.HandlerFunc(hello))
mux.Handle(<span style="color: #a31515">&quot;/static/&quot;</span>, http.FileServer(http.Dir(<span style="color: #a31515">&quot;./static-assets&quot;</span>)))
http.ListenAndServe(<span style="color: #a31515">&quot;:8080&quot;</span>, mux)
}
</pre></div>
</p>
<p>If you visit <code>http://localhost:8080/hello/</code>, you&rsquo;ll be rewarded with a friendly
&ldquo;hello, world!&rdquo; message.</p>
<p>If you visit <code>http://localhost:8080/static/</code> on the other hand (assuming you
have a folder of static assets in <code>./static-assets</code>), you&rsquo;ll be surprised and
frustrated. This code tries to find the source content for the request
<code>/static/my-file</code> at <code>./static-assets/static/my-file</code>! There&rsquo;s an extra
<code>/static</code> in there!</p>
<p>Okay, so this is why <code>http.StripPrefix</code> exists. Let&rsquo;s fix it.</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span> mux.Handle(<span style="color: #a31515">&quot;/static/&quot;</span>, http.StripPrefix(<span style="color: #a31515">&quot;/static&quot;</span>,
http.FileServer(http.Dir(<span style="color: #a31515">&quot;./static-assets&quot;</span>))))
</pre></div>
</p>
<p><code>mux.Handle</code> combined with <code>http.StripPrefix</code> is such a common pattern that I
think it should be the default. Whenever a request router processes a certain
amount of URL elements, it should strip them off the request so the wrapped
<code>http.Handler</code> doesn&rsquo;t need to know its absolute URL and only needs to be
concerned with its relative one.</p>
<p>In <a href="https://swtch.com/~rsc/">Russ Cox</a>&rsquo;s recent
<a href="https://github.com/rsc/tiddly">TiddlyWeb backend</a>, I would argue that every
time <code>strings.TrimPrefix</code> is needed to remove the full URL from the handler&rsquo;s
incoming path arguments, it is an unnecessary cross-cutting concern,
unfortunately imposed by <code>http.ServeMux</code>. (An example is
<a href="https://github.com/rsc/tiddly/blob/8f9145ac183e374eb95d90a73be4d5f38534ec47/tiddly.go#L201">line 201 in tiddly.go</a>.)</p>
<p>I&rsquo;d much rather have the default <code>mux</code> behavior work more like a directory
of registered elements that by default strips off the ancestor directory
before handing the request to the next middleware handler. It&rsquo;s much more
composable. To this end, I&rsquo;ve written a simple muxer that works in this
fashion called <a href="https://godoc.org/gopkg.in/webhelp.v1/whmux#Dir">whmux.Dir</a>.
It is essentially <code>http.ServeMux</code> and <code>http.StripPrefix</code> combined. Here&rsquo;s the
previous example reworked to use it:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">package</span> main
<span style="color: #0000ff">import</span> (
<span style="color: #a31515">&quot;net/http&quot;</span>
<span style="color: #a31515">&quot;gopkg.in/webhelp.v1/whmux&quot;</span>
)
<span style="color: #0000ff">func</span> hello(w http.ResponseWriter, req *http.Request) {
w.Write([]byte(<span style="color: #a31515">&quot;hello, world!&quot;</span>))
}
<span style="color: #0000ff">func</span> main() {
mux := whmux.Dir{
<span style="color: #a31515">&quot;hello&quot;</span>: http.HandlerFunc(hello),
<span style="color: #a31515">&quot;static&quot;</span>: http.FileServer(http.Dir(<span style="color: #a31515">&quot;./static-assets&quot;</span>)),
}
http.ListenAndServe(<span style="color: #a31515">&quot;:8080&quot;</span>, mux)
}
</pre></div>
</p>
<p>There are other useful mux implementations inside the
<a href="https://godoc.org/gopkg.in/webhelp.v1/whmux">whmux</a> package that demultiplex
on various aspects of the request path, request method, request host, or
pull arguments out of the request and place them into the context, such as a
<a href="https://godoc.org/gopkg.in/webhelp.v1/whmux#IntArg">whmux.IntArg</a> or
<a href="https://godoc.org/gopkg.in/webhelp.v1/whmux#StringArg">whmux.StringArg</a>. This
brings us to <a href="https://golang.org/pkg/context/">contexts</a>.</p>
<h2 id="contexts:f9c06ffbaee33bff858b5f3a6f296479">Contexts</h2>
<p>Request contexts are a recent addition to the Go 1.7 standard library, but
the idea of
<a href="https://blog.golang.org/context">contexts has been around since mid-2014</a>. As
of Go 1.7, they were added to the standard library
(<a href="https://golang.org/pkg/context/">&ldquo;context&rdquo;</a>), but are
available for older Go releases in the
original location
(<a href="https://godoc.org/golang.org/x/net/context">&ldquo;golang.org/x/net/context&rdquo;</a>).</p>
<p>First, here&rsquo;s the definition of the <code>context.Context</code> type that
<code>(*http.Request).Context()</code> returns:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> Context <span style="color: #0000ff">interface</span> {
Done() &lt;-<span style="color: #0000ff">chan</span> <span style="color: #0000ff">struct</span>{}
Err() <span style="color: #2b91af">error</span>
Deadline() (deadline time.Time, ok <span style="color: #2b91af">bool</span>)
Value(key <span style="color: #0000ff">interface</span>{}) <span style="color: #0000ff">interface</span>{}
}
</pre></div>
</p>
<p>Talking about <code>Done()</code>, <code>Err()</code>, and <code>Deadline()</code> are enough for an entirely
different blog post, so I&rsquo;m going to ignore them at least for now and focus on
<code>Value(interface{})</code>.</p>
<p>As a motivating problem, let&rsquo;s say that the <code>GetUser(*http.Request)</code> method
we assumed earlier is expensive, and we only want to call it once per request.
We certainly don&rsquo;t want to call it once to check that a user is logged in, and
then again when we actually need the <code>*User</code> value. With
<code>(*http.Request).WithContext</code> and <code>context.WithValue</code>, we can pass the <code>*User</code>
down to the next middleware precomputed!</p>
<p>Here&rsquo;s the new middleware:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> userKey <span style="color: #2b91af">int</span>
<span style="color: #0000ff">func</span> RequireUser(h http.Handler) http.Handler {
<span style="color: #0000ff">return</span> http.HandlerFunc(<span style="color: #0000ff">func</span>(w http.ResponseWriter, req *http.Request) {
user, err := GetUser(req)
<span style="color: #0000ff">if</span> err != <span style="color: #0000ff">nil</span> {
http.Error(w, err.Error(), http.StatusInternalServerError)
<span style="color: #0000ff">return</span>
}
<span style="color: #0000ff">if</span> user == <span style="color: #0000ff">nil</span> {
http.Error(w, <span style="color: #a31515">&quot;unauthorized&quot;</span>, http.StatusUnauthorized)
<span style="color: #0000ff">return</span>
}
ctx := r.Context()
ctx = context.WithValue(ctx, userKey(0), user)
h.ServeHTTP(w, req.WithContext(ctx))
})
}
</pre></div>
</p>
<p>Now, handlers that are protected by this <code>RequireUser</code> handler can load the
previously computed <code>*User</code> value like this:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">if</span> user, ok := req.Context().Value(userKey(0)).(*User); ok {
<span style="color: #008000">// there&#39;s a valid user!</span>
}
</pre></div>
</p>
<p>Contexts allow us to pass optional values to handlers down the chain in a way
that is relatively type-safe and flexible. None of the above context logic
requires anything outside of the standard library.</p>
<h3 id="aside-about-context-keys:f9c06ffbaee33bff858b5f3a6f296479">Aside about context keys</h3>
<p>There was a curious piece of code in the above example. At the top, we
defined a <code>type userKey int</code>, and then always used it as <code>userKey(0)</code>.</p>
<p>One of the possible problems with contexts is the <code>Value()</code> interface lends
itself to a global namespace where you can stomp on other context users and
use conflicting key names. Above, we used <code>type userKey</code> because it&rsquo;s an
unexported type in your package. It will never compare equal (without a cast)
to any other type, including <code>int</code>, in Go. This gives us a way to namespace keys
to your package, even though the <code>Value()</code> method is still a sort of global
namespace.</p>
<p>Because the need for this is so common, the <code>webhelp</code> package defines a
<a href="https://godoc.org/gopkg.in/webhelp.v1#GenSym">GenSym()</a> helper that will
create a brand new, never-before-seen, unique value for use as a context key.</p>
<p>If we used <a href="https://godoc.org/gopkg.in/webhelp.v1#GenSym">GenSym()</a>, then
<code>type userKey int</code> would become <code>var userKey = webhelp.GenSym()</code> and
<code>userKey(0)</code> would simply become <code>userKey</code>.</p>
<h3 id="back-to-whmux-stringarg:f9c06ffbaee33bff858b5f3a6f296479">Back to whmux.StringArg</h3>
<p>Armed with this new context behavior, we can now present a <code>whmux.StringArg</code>
example:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">package</span> main
<span style="color: #0000ff">import</span> (
<span style="color: #a31515">&quot;fmt&quot;</span>
<span style="color: #a31515">&quot;net/http&quot;</span>
<span style="color: #a31515">&quot;gopkg.in/webhelp.v1/whmux&quot;</span>
)
<span style="color: #0000ff">var</span> (
pageName = whmux.NewStringArg()
)
<span style="color: #0000ff">func</span> page(w http.ResponseWriter, req *http.Request) {
name := pageName.Get(req.Context())
fmt.Fprintf(w, <span style="color: #a31515">&quot;Welcome to %s&quot;</span>, name)
}
<span style="color: #0000ff">func</span> main() {
<span style="color: #008000">// pageName.Shift pulls the next /-delimited string out of the request&#39;s</span>
<span style="color: #008000">// URL.Path and puts it into the context instead.</span>
pageHandler := pageName.Shift(http.HandlerFunc(page))
http.ListenAndServe(<span style="color: #a31515">&quot;:8080&quot;</span>, whmux.Dir{
<span style="color: #a31515">&quot;wiki&quot;</span>: pageHandler,
})
}
</pre></div>
</p>
<h2 id="pre-go-1-7-support:f9c06ffbaee33bff858b5f3a6f296479">Pre-Go-1.7 support</h2>
<p>Contexts let you do some pretty cool things. But let&rsquo;s say you&rsquo;re stuck with
something before Go 1.7 (for instance, App Engine is currently Go 1.6).</p>
<p>That&rsquo;s okay! I&rsquo;ve backported all of the neat new context features to Go 1.6 and
earlier in a forwards compatible way!</p>
<p>With the <a href="https://godoc.org/gopkg.in/webhelp.v1/whcompat">whcompat</a> package,
<code>req.Context()</code> becomes <code>whcompat.Context(req)</code>, and
<code>req.WithContext(ctx)</code> becomes <code>whcompat.WithContext(req, ctx)</code>. The <code>whcompat</code>
versions work with all releases of Go. Yay!</p>
<p>There&rsquo;s a bit of unpleasantness behind the scenes to make this happen.
Specifically, for pre-1.7 builds, a global map indexed by <code>req.URL</code> is kept,
and a finalizer is installed on <code>req</code> to clean up. So don&rsquo;t change what
<code>req.URL</code> points to and this will work fine. In practice it&rsquo;s not a problem.</p>
<p><code>whcompat</code> adds additional backwards-compatibility helpers. In Go 1.7 and on,
the context&rsquo;s <code>Done()</code> channel is closed (and <code>Err()</code> is set), whenever the
request is done processing. If you want this behavior in Go 1.6 and earlier,
just use the
<a href="https://godoc.org/gopkg.in/webhelp.v1/whcompat#DoneNotify">whcompat.DoneNotify</a>
middleware.</p>
<p>In Go 1.8 and on, the context&rsquo;s <code>Done()</code> channel is closed when the client goes
away, even if the request hasn&rsquo;t completed. If you want this behavior in Go 1.7
and earlier, just use the
<a href="https://godoc.org/gopkg.in/webhelp.v1/whcompat#CloseNotify">whcompat.CloseNotify</a>
middleware, though beware that it costs an extra goroutine.</p>
<h2 id="error-handling:f9c06ffbaee33bff858b5f3a6f296479">Error handling</h2>
<p>How you handle errors can be another cross-cutting concern, but with good
application of context and middleware, it too can be beautifully cleaned up so
that the responsibilities lie in the correct place.</p>
<p>Problem statement: your <code>RequireUser</code> middleware needs to handle an
authentication error differently between your HTML endpoints and your JSON API
endpoints. You want to use <code>RequireUser</code> for both types of endpoints, but with
your HTML endpoints you want to return a user-friendly error page, and with
your JSON API endpoints you want to return an appropriate JSON error state.</p>
<p>In my opinion, the right thing to do is to have contextual error handlers, and
luckily, we have a context for contextual information!</p>
<p>First, we need an error handler interface.</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> ErrHandler <span style="color: #0000ff">interface</span> {
HandleError(w http.ResponseWriter, req *http.Request, err <span style="color: #2b91af">error</span>)
}
</pre></div>
</p>
<p>Next, let&rsquo;s make a middleware that registers the error handler in the context:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">var</span> errHandler = webhelp.GenSym() <span style="color: #008000">// see the aside about context keys</span>
<span style="color: #0000ff">func</span> HandleErrWith(eh ErrHandler, h http.Handler) http.Handler {
<span style="color: #0000ff">return</span> http.HandlerFunc(<span style="color: #0000ff">func</span>(w http.ResponseWriter, req *http.Request) {
ctx := context.WithValue(whcompat.Context(req), errHandler, eh)
h.ServeHTTP(w, whcompat.WithContext(req, ctx))
})
}
</pre></div>
</p>
<p>Last, let&rsquo;s make a function that will use the registered error handler for
errors:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> HandleErr(w http.ResponseWriter, req *http.Request, err <span style="color: #2b91af">error</span>) {
<span style="color: #0000ff">if</span> handler, ok := whcompat.Context(req).Value(errHandler).(ErrHandler); ok {
handler.HandleError(w, req, err)
<span style="color: #0000ff">return</span>
}
log.Printf(<span style="color: #a31515">&quot;error: %v&quot;</span>, err)
http.Error(w, <span style="color: #a31515">&quot;internal server error&quot;</span>, http.StatusInternalServerError)
}
</pre></div>
</p>
<p>Now, as long as everything uses <code>HandleErr</code> to handle errors, our JSON API
can handle errors with JSON responses, and our HTML endpoints can handle errors
with HTML responses.</p>
<p>Of course, the <a href="https://godoc.org/gopkg.in/webhelp.v1/wherr">wherr</a> package
implements this all for you, and the
<a href="https://godoc.org/gopkg.in/webhelp.v1/wherr">whjson</a> package even implements
a friendly JSON API error handler.</p>
<p>Here&rsquo;s how you might use it:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">var</span> userKey = webhelp.GenSym()
<span style="color: #0000ff">func</span> RequireUser(h http.Handler) http.Handler {
<span style="color: #0000ff">return</span> http.HandlerFunc(<span style="color: #0000ff">func</span>(w http.ResponseWriter, req *http.Request) {
user, err := GetUser(req)
<span style="color: #0000ff">if</span> err != <span style="color: #0000ff">nil</span> {
wherr.Handle(w, req, wherr.InternalServerError.New(<span style="color: #a31515">&quot;failed to get user&quot;</span>))
<span style="color: #0000ff">return</span>
}
<span style="color: #0000ff">if</span> user == <span style="color: #0000ff">nil</span> {
wherr.Handle(w, req, wherr.Unauthorized.New(<span style="color: #a31515">&quot;no user found&quot;</span>))
<span style="color: #0000ff">return</span>
}
ctx := r.Context()
ctx = context.WithValue(ctx, userKey, user)
h.ServeHTTP(w, req.WithContext(ctx))
})
}
<span style="color: #0000ff">func</span> userpage(w http.ResponseWriter, req *http.Request) {
user := req.Context().Value(userKey).(*User)
w.Header().Set(<span style="color: #a31515">&quot;Content-Type&quot;</span>, <span style="color: #a31515">&quot;text/html&quot;</span>)
userpageTmpl.Execute(w, user)
}
<span style="color: #0000ff">func</span> username(w http.ResponseWriter, req *http.Request) {
user := req.Context().Value(userKey).(*User)
w.Header().Set(<span style="color: #a31515">&quot;Content-Type&quot;</span>, <span style="color: #a31515">&quot;application/json&quot;</span>)
json.NewEncoder(w).Encode(<span style="color: #0000ff">map</span>[<span style="color: #2b91af">string</span>]<span style="color: #0000ff">interface</span>{}{<span style="color: #a31515">&quot;user&quot;</span>: user})
}
<span style="color: #0000ff">func</span> main() {
http.ListenAndServe(<span style="color: #a31515">&quot;:8080&quot;</span>, whmux.Dir{
<span style="color: #a31515">&quot;api&quot;</span>: wherr.HandleWith(whjson.ErrHandler,
RequireUser(whmux.Dir{
<span style="color: #a31515">&quot;username&quot;</span>: http.HandlerFunc(username),
})),
<span style="color: #a31515">&quot;user&quot;</span>: RequireUser(http.HandlerFunc(userpage)),
})
}
</pre></div>
</p>
<h3 id="aside-about-the-spacemonkeygo-errors-package:f9c06ffbaee33bff858b5f3a6f296479">Aside about the spacemonkeygo/errors package</h3>
<p>The default <a href="https://godoc.org/gopkg.in/webhelp.v1/wherr#Handle">wherr.Handle</a>
implementation understands all of the <a href="https://godoc.org/gopkg.in/webhelp.v1/wherr#pkg-variables">error classes defined in the
wherr top level
package</a>.</p>
<p>These error classes are implemented using the
<a href="https://godoc.org/github.com/spacemonkeygo/errors">spacemonkeygo/errors</a>
library and the
<a href="https://godoc.org/github.com/spacemonkeygo/errors/errhttp">spacemonkeygo/errors/errhttp</a>
extensions. You don&rsquo;t have to use this library or these errors, but the benefit
is that your error instances can be extended to include HTTP status code
messages and information, which once again, provides for a nice elimination of
cross-cutting concerns in your error handling logic.</p>
<p>See the <a href="https://godoc.org/github.com/spacemonkeygo/errors">spacemonkeygo/errors</a>
package for more details.</p>
<p><em><strong>Update 2018-04-19:</strong> After a few years of use, my friend condensed some
lessons we learned and the best parts of <code>spacemonkeygo/errors</code> into a new,
more concise, better library, over at
<a href="https://github.com/zeebo/errs">github.com/zeebo/errs</a>. Consider using that
instead!</em></p>
<h2 id="sessions:f9c06ffbaee33bff858b5f3a6f296479">Sessions</h2>
<p>Go&rsquo;s standard library has great support for cookies, but cookies by themselves
aren&rsquo;t usually what a developer thinks of when she thinks about sessions.
Cookies are unencrypted, unauthenticated, and readable by the user, and perhaps
you don&rsquo;t want that with your session data.</p>
<p>Further, sessions can be stored in cookies, but could also be stored in a
database to provide features like session revocation and querying. There&rsquo;s lots
of potential details about the implementation of sessions.</p>
<p>Request handlers, however, probably don&rsquo;t care too much about the
implementation details of the session. Request handlers usually just want a
bucket of keys and values they can store safely and securely.</p>
<p>The <a href="https://godoc.org/gopkg.in/webhelp.v1/whsess">whsess</a> package implements
middleware for registering an arbitrary session store (a default cookie-based
session store is provided), and implements helpers for retrieving and saving
new values into the session.</p>
<p>The default cookie-based session store implements encryption and authentication
via the excellent
<a href="https://godoc.org/golang.org/x/crypto/nacl/secretbox">nacl/secretbox</a> package.</p>
<p>Usage is like this:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> handler(w http.ResponseWriter, req *http.Request) {
ctx := whcompat.Context(req)
sess, err := whsess.Load(ctx, <span style="color: #a31515">&quot;namespace&quot;</span>)
<span style="color: #0000ff">if</span> err != <span style="color: #0000ff">nil</span> {
wherr.Handle(w, req, err)
<span style="color: #0000ff">return</span>
}
<span style="color: #0000ff">if</span> loggedIn, _ := sess.Values[<span style="color: #a31515">&quot;logged_in&quot;</span>].(<span style="color: #2b91af">bool</span>); loggedIn {
views, _ := sess.Values[<span style="color: #a31515">&quot;views&quot;</span>].(<span style="color: #2b91af">int64</span>)
sess.Values[<span style="color: #a31515">&quot;views&quot;</span>] = views + 1
sess.Save(w)
}
}
<span style="color: #0000ff">func</span> main() {
http.ListenAndServe(<span style="color: #a31515">&quot;:8080&quot;</span>, whsess.HandlerWithStore(
whsess.NewCookieStore(secret), http.HandlerFunc(handler)))
}
</pre></div>
</p>
<h2 id="logging:f9c06ffbaee33bff858b5f3a6f296479">Logging</h2>
<p>The Go standard library by default doesn&rsquo;t log incoming requests, outgoing
responses, or even just what port the HTTP server is listening on.</p>
<p>The <a href="https://godoc.org/gopkg.in/webhelp.v1/whlog">whlog</a> package implements
all three.
The <a href="https://godoc.org/gopkg.in/webhelp.v1/whlog#LogRequests">whlog.LogRequests</a>
middleware will log requests as they start. The
<a href="https://godoc.org/gopkg.in/webhelp.v1/whlog#LogResponses">whlog.LogResponses</a>
middleware will log requests as they end, along with status code and timing
information.
<a href="https://godoc.org/gopkg.in/webhelp.v1/whlog#ListenAndServe">whlog.ListenAndServe</a>
will log the address the server ultimately listens on (if you specify &ldquo;:0&rdquo; as
your address, a port will be randomly chosen, and
<a href="https://godoc.org/gopkg.in/webhelp.v1/whlog#ListenAndServe">whlog.ListenAndServe</a>
will log it).</p>
<p><a href="https://godoc.org/gopkg.in/webhelp.v1/whlog#LogResponses">whlog.LogResponses</a>
deserves special mention for how it does what it does. It uses the
<a href="https://godoc.org/gopkg.in/webhelp.v1/whmon">whmon</a> package to instrument
the outgoing <code>http.ResponseWriter</code> to keep track of response information.</p>
<p>Usage is like this:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> main() {
whlog.ListenAndServe(<span style="color: #a31515">&quot;:8080&quot;</span>, whlog.LogResponses(whlog.Default, handler))
}
</pre></div>
</p>
<h3 id="app-engine-logging:f9c06ffbaee33bff858b5f3a6f296479">App engine logging</h3>
<p>App engine logging is unconventional crazytown. The standard library logger
doesn&rsquo;t work by default on App Engine, because App Engine logs <em>require</em> the
request context. This is unfortunate for libraries that don&rsquo;t necessarily run
on App Engine all the time, as their logging information doesn&rsquo;t make it to the
App Engine request-specific logger.</p>
<p>Unbelievably, this is fixable with
<a href="https://godoc.org/gopkg.in/webhelp.v1/whgls">whgls</a>, which uses my terrible,
terrible (but recently improved)
<a href="https://godoc.org/github.com/jtolds/gls">Goroutine-local storage library</a> to
store the request context on the current stack, register a new log output, and
fix logging so standard library logging works with App Engine again.</p>
<h2 id="template-handling:f9c06ffbaee33bff858b5f3a6f296479">Template handling</h2>
<p>Go&rsquo;s standard library <a href="https://golang.org/pkg/html/template/">html/template</a>
package is excellent, but you&rsquo;ll be unsurprised to find there&rsquo;s a few tasks I
do with it so commonly that I&rsquo;ve written additional support code.</p>
<p>The <a href="https://godoc.org/gopkg.in/webhelp.v1/whtmpl">whtmpl</a> package really does
two things. First, it provides a number of useful helper methods for use within
templates, and second, it takes some friction out of managing a large number of
templates.</p>
<p>When writing templates, one thing you can do is call out to other registered
templates for small values. A good example might be some sort of list element.
You can have a template that renders the list element, and then your template
that renders your list can use the list element template in turn.</p>
<p>Use of another template within a template might look like this:</p>
<pre><code>&lt;ul&gt;
{{ range .List }}
{{ template &quot;list_element&quot; . }}
{{ end }}
&lt;/ul&gt;
</code></pre>
<p>You&rsquo;re now rendering the <code>list_element</code> template with the list element from
<code>.List</code>. But what if you want to also pass the current user <code>.User</code>?
Unfortunately, you can only pass one argument from one template to another.
If you have two arguments you want to pass to another template, with the
standard library, you&rsquo;re out of luck.</p>
<p>The <a href="https://godoc.org/gopkg.in/webhelp.v1/whtmpl">whtmpl</a> package adds three
helper functions to aid you here, <code>makepair</code>, <code>makemap</code>, and <code>makeslice</code> (more
docs under the
<a href="https://godoc.org/gopkg.in/webhelp.v1/whtmpl#Collection">whtmpl.Collection</a>
type). <code>makepair</code> is the simplest. It takes two arguments and constructs a
<a href="https://godoc.org/gopkg.in/webhelp.v1/whtmpl#Pair">whtmpl.Pair</a>. Fixing our
example above would look like this now:</p>
<pre><code>&lt;ul&gt;
{{ $user := .User }}
{{ range .List }}
{{ template &quot;list_element&quot; (makepair . $user) }}
{{ end }}
&lt;/ul&gt;
</code></pre>
<p>The second thing <a href="https://godoc.org/gopkg.in/webhelp.v1/whtmpl">whtmpl</a> does
is make defining lots of templates easy, by optionally automatically naming
templates after the name of the file the template is defined in.</p>
<p>For example, say you have three files.</p>
<p>Here&rsquo;s <code>pkg.go</code>:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">package</span> views
<span style="color: #0000ff">import</span> <span style="color: #a31515">&quot;gopkg.in/webhelp.v1/whtmpl&quot;</span>
<span style="color: #0000ff">var</span> Templates = whtmpl.NewCollection()
</pre></div>
</p>
<p>Here&rsquo;s <code>landing.go</code>:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">package</span> views
<span style="color: #0000ff">var</span> _ = Templates.MustParse(<span style="color: #a31515">`{{ template &quot;header&quot; . }}</span>
<span style="color: #a31515"> &lt;h1&gt;Landing!&lt;/h1&gt;`</span>)
</pre></div>
</p>
<p>And here&rsquo;s <code>header.go</code>:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">package</span> views
<span style="color: #0000ff">var</span> _ = Templates.MustParse(<span style="color: #a31515">`&lt;title&gt;My website!&lt;/title&gt;`</span>)
</pre></div>
</p>
<p>Now, you can import your new <code>views</code> package and render the <code>landing</code> template
this easily:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> handler(w http.ResponseWriter, req *http.Request) {
views.Templates.Render(w, req, <span style="color: #a31515">&quot;landing&quot;</span>, <span style="color: #0000ff">map</span>[<span style="color: #2b91af">string</span>]<span style="color: #0000ff">interface</span>{}{})
}
</pre></div>
</p>
<h2 id="user-authentication:f9c06ffbaee33bff858b5f3a6f296479">User authentication</h2>
<p>I&rsquo;ve written two Webhelp-style authentication libraries that I end up using
frequently.</p>
<p>The first is an OAuth2 library,
<a href="https://godoc.org/gopkg.in/go-webhelp/whoauth2.v1">whoauth2</a>. I&rsquo;ve written
up <a href="https://github.com/go-webhelp/whoauth2/blob/v1/examples/group/main.go">an example application that authenticates with Google, Facebook, and
Github</a>.</p>
<p>The second, <a href="https://godoc.org/gopkg.in/go-webhelp/whgoth.v1">whgoth</a>, is a
wrapper around <a href="https://github.com/markbates/goth">markbates/goth</a>. My portion
isn&rsquo;t quite complete yet (some fixes are still necessary for optional App
Engine support), but will support more non-OAuth2 authentication sources
(like Twitter) when it is done.</p>
<h2 id="route-listing:f9c06ffbaee33bff858b5f3a6f296479">Route listing</h2>
<p>Surprise! If you&rsquo;ve used <a href="https://godoc.org/gopkg.in/webhelp.v1">webhelp</a> based
handlers and middleware for your whole app, you automatically get route listing
for free, via the <a href="https://godoc.org/gopkg.in/webhelp.v1/whroute">whroute</a>
package.</p>
<p>My web serving code&rsquo;s <code>main</code> method often has a form like this:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">switch</span> flag.Arg(0) {
<span style="color: #0000ff">case</span> <span style="color: #a31515">&quot;serve&quot;</span>:
panic(whlog.ListenAndServe(*listenAddr, routes))
<span style="color: #0000ff">case</span> <span style="color: #a31515">&quot;routes&quot;</span>:
whroute.PrintRoutes(os.Stdout, routes)
<span style="color: #0000ff">default</span>:
fmt.Printf(<span style="color: #a31515">&quot;Usage: %s &lt;serve|routes&gt;\n&quot;</span>, os.Args[0])
}
</pre></div>
</p>
<p>Here&rsquo;s some example output:</p>
<pre><code>GET /auth/_cb/
GET /auth/login/
GET /auth/logout/
GET /
GET /account/apikeys/
POST /account/apikeys/
GET /project/&lt;int&gt;/
GET /project/&lt;int&gt;/control/&lt;int&gt;/
POST /project/&lt;int&gt;/control/&lt;int&gt;/sample/
GET /project/&lt;int&gt;/control/
Redirect: f(req)
POST /project/&lt;int&gt;/control/
POST /project/&lt;int&gt;/control_named/&lt;string&gt;/sample/
GET /project/&lt;int&gt;/control_named/
Redirect: f(req)
GET /project/&lt;int&gt;/sample/&lt;int&gt;/
GET /project/&lt;int&gt;/sample/&lt;int&gt;/similar[/&lt;*&gt;]
GET /project/&lt;int&gt;/sample/
Redirect: f(req)
POST /project/&lt;int&gt;/search/
GET /project/
Redirect: /
POST /project/
</code></pre>
<h2 id="other-little-things:f9c06ffbaee33bff858b5f3a6f296479">Other little things</h2>
<p><a href="https://godoc.org/gopkg.in/webhelp.v1">webhelp</a> has a number of other
subpackages:</p>
<ul>
<li><a href="https://godoc.org/gopkg.in/webhelp.v1/whparse">whparse</a> assists in parsing
optional request arguments.</li>
<li><a href="https://godoc.org/gopkg.in/webhelp.v1/whredir">whredir</a> provides some
handlers and helper methods for doing redirects in various cases.</li>
<li><a href="https://godoc.org/gopkg.in/webhelp.v1/whcache">whcache</a> creates
request-specific mutable storage for caching various computations and
database loaded data. Mutability helps helper functions that aren&rsquo;t
used as middleware share data.</li>
<li><a href="https://godoc.org/gopkg.in/webhelp.v1/whfatal">whfatal</a> uses panics to
simplify early request handling termination. Probably avoid this package
unless you want to anger other Go developers.</li>
</ul>
<h2 id="summary:f9c06ffbaee33bff858b5f3a6f296479">Summary</h2>
<p>Designing your web project as a collection of composable middlewares goes quite
a long way to simplify your code design, eliminate cross-cutting concerns, and
create a more flexible development environment. Use my
<a href="https://godoc.org/gopkg.in/webhelp.v1">webhelp</a> package if it helps you.</p>
<p>Or don&rsquo;t! Whatever! It&rsquo;s still a free country last I checked.</p>
<h3 id="update:f9c06ffbaee33bff858b5f3a6f296479">Update</h3>
<p>Peter Kieltyka points me to his
<a href="https://github.com/pressly/chi">Chi framework</a>, which actually does seem to do
the right things with respect to middleware, handlers, and contexts - certainly
much more so than all the other frameworks I&rsquo;ve seen. So, shoutout to Peter and
the team at Pressly!</p>
Monkit: metrics and tracing library for Gohttps://www.jtolio.com/2016/06/monkit-metrics-and-tracing-library-for-go
Fri, 24 Jun 2016 00:33:00 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2016/06/monkit-metrics-and-tracing-library-for-go<p>I recently spent a while working on a blog post to talk about a new open source
library Space Monkey just released. I then realized that it was all great
documentation and should probably just be with the project.</p>
<p>So, <a href="https://github.com/spacemonkeygo/monkit/blob/master/README.md">here is the monkit README.md</a>
that I wrote. That is all.</p>
Go channels are bad and you should feel badhttps://www.jtolio.com/2016/03/go-channels-are-bad-and-you-should-feel-bad
Wed, 02 Mar 2016 08:38:00 -0700hello@jtolio.com (JT Olio)https://www.jtolio.com/2016/03/go-channels-are-bad-and-you-should-feel-bad
<p><em>Update: If you&rsquo;re coming to this blog post from a compendium titled &ldquo;Go is
not good,&rdquo; I want to make it clear that I am ashamed to be on such a list. Go
is absolutely the least worst programming language I&rsquo;ve ever used. At the
time I wrote this, I wanted to curb a trend I was seeing, namely, overuse of
one of the more warty parts of Go. I still think channels could be much better,
but overall, Go is wonderful. It&rsquo;s like if your favorite toolbox had
<a href="https://blog.codinghorror.com/content/images/uploads/2012/06/6a0120a85dcdae970b017742d249d5970d-800wi.jpg">this</a>
in it; the tool can have uses (even if it could have had more uses), and it
can still be your favorite toolbox!</em></p>
<p>I&rsquo;ve been using Google&rsquo;s <a href="http://golang.org/">Go programming language</a> on and
off since mid-to-late 2010, and I&rsquo;ve had legitimate product code written in Go
for <a href="http://www.spacemonkey.com/">Space Monkey</a> since January 2012 (before Go
1.0!). My initial experience with Go was back when I was researching Hoare&rsquo;s
<a href="https://en.wikipedia.org/wiki/Communicating_sequential_processes">Communicating Sequential Processes</a>
model of concurrency and the <a href="https://en.wikipedia.org/wiki/%CE%A0-calculus">π-calculus</a>
under <a href="http://matt.might.net">Matt Might</a>&rsquo;s
<a href="http://www.ucombinator.org/">UCombinator research group</a> as part of my
(<a href="https://www.jtolio.com/writing/2015/11/research-log-cell-states-and-microarrays/">now redirected</a>)
PhD work to better enable multicore development. Go was announced right then
(how serendipitous!) and I immediately started kicking tires.</p>
<p>It quickly became a core part of Space Monkey development. Our production
systems at Space Monkey currently account for over 425k lines of pure
Go (<em>not</em> counting all of our vendored libraries, which would make it just shy
of 1.5 million lines), so not the most Go you&rsquo;ll ever
see, but for the relatively young language we&rsquo;re heavy users. We&rsquo;ve
<a href="https://www.jtolio.com/writing/2014/04/go-space-monkey/">written about our Go usage</a>
before. We&rsquo;ve open-sourced some fairly heavily used libraries; many people seem
to be fans of our
<a href="https://godoc.org/github.com/spacemonkeygo/openssl">OpenSSL bindings</a>
(which are faster than <a href="https://golang.org/pkg/crypto/tls/">crypto/tls</a>, but
please keep openssl itself up-to-date!), our
<a href="https://godoc.org/github.com/spacemonkeygo/errors">error handling library</a>,
<a href="https://godoc.org/github.com/spacemonkeygo/spacelog">logging library</a>, and
<a href="https://godoc.org/gopkg.in/spacemonkeygo/monitor.v1">metric collection library/zipkin client</a>.
We use Go, we love Go, we think it&rsquo;s the least bad programming language for our
needs we&rsquo;ve used so far.</p>
<p>Although I don&rsquo;t think I can talk myself out of mentioning my widely avoided
<a href="https://github.com/jtolds/gls">goroutine-local-storage library</a> here either
(which even though it&rsquo;s a hack that you shouldn&rsquo;t use, it&rsquo;s a beautiful hack),
hopefully my other experience will suffice as valid credentials that I kind of
know what I&rsquo;m talking about before I explain my deliberately inflamatory post
title.</p>
<p><div class="float-left">
<p><figure>
<img src="https://www.jtolio.com/images/wat/darth-helmet.jpg" alt="Darth Helmet" onmouseover="this.src='\/images\/wat\/darth-helmet.gif';" onclick="this.src='\/images\/wat\/darth-helmet.gif'; this.onmouseout=null;" onmouseout="this.src='\/images\/wat\/darth-helmet.jpg';" /></p>
<p></figure></p>
</div>
</p>
<h2 id="wait-what:7de476e370ca6a780a51fd680c5a51dd">Wait, what?</h2>
<p>If you ask the proverbial programmer on the street what&rsquo;s so special
about Go, she&rsquo;ll most likely tell you that Go is most known for channels and
goroutines. Go&rsquo;s theoretical underpinnings are heavily based in Hoare&rsquo;s CSP
model, which is itself incredibly fascinating and interesting and I firmly
believe has much more to yield than we&rsquo;ve appropriated so far.</p>
<p>CSP (and the π-calculus) both use communication as the core synchronization
primitive, so it makes sense Go would have channels. Rob Pike has been
fascinated with CSP (with good reason) for a
<a href="https://en.wikipedia.org/wiki/Newsqueak">considerable</a>
<a href="https://en.wikipedia.org/wiki/Alef_%28programming_language%29">while</a>
<a href="https://en.wikipedia.org/wiki/Limbo_%28programming_language%29">now</a>.</p>
<p>But from a pragmatic perspective (which Go prides itself on), Go got channels
wrong. Channels as implemented are pretty much a solid anti-pattern in my book
at this point. Why? Dear reader, let me count the ways.</p>
<p><div class="clear-both"></div>
</p>
<h3 id="you-probably-won-t-end-up-using-just-channels:7de476e370ca6a780a51fd680c5a51dd">You probably won&rsquo;t end up using just channels.</h3>
<p>Hoare&rsquo;s Communicating Sequential Processes is a computational model where
essentially the only synchronization primitive is sending or receiving on a
channel. As soon as you use a mutex, semaphore, or condition variable, bam,
you&rsquo;re no longer in pure CSP land. Go programmers often tout this model and
philosophy through the chanting of the
<a href="http://lesswrong.com/lw/k5/cached_thoughts/">cached thought</a>
&ldquo;<a href="https://blog.golang.org/share-memory-by-communicating">share memory by communicating</a>.&rdquo;</p>
<p>So let&rsquo;s try and write a small program using just CSP in Go! Let&rsquo;s make a
high score receiver. All we will do is keep track of the largest high score
value we&rsquo;ve seen. That&rsquo;s it.</p>
<p>First, we&rsquo;ll make a <code>Game</code> struct.</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> Game <span style="color: #0000ff">struct</span> {
bestScore <span style="color: #2b91af">int</span>
scores <span style="color: #0000ff">chan</span> <span style="color: #2b91af">int</span>
}
</pre></div>
</p>
<p><code>bestScore</code> isn&rsquo;t going to be protected by a mutex! That&rsquo;s fine, because we&rsquo;ll
simply have one goroutine manage its state and receive new scores over a
channel.</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> (g *Game) run() {
<span style="color: #0000ff">for</span> score := <span style="color: #0000ff">range</span> g.scores {
<span style="color: #0000ff">if</span> g.bestScore &lt; score {
g.bestScore = score
}
}
}
</pre></div>
</p>
<p>Okay, now we&rsquo;ll make a helpful constructor to start a game.</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> NewGame() (g *Game) {
g = &amp;Game{
bestScore: 0,
scores: make(<span style="color: #0000ff">chan</span> <span style="color: #2b91af">int</span>),
}
<span style="color: #0000ff">go</span> g.run()
<span style="color: #0000ff">return</span> g
}
</pre></div>
</p>
<p>Next, let&rsquo;s assume someone has given us a <code>Player</code> that can return scores.
It might also return an error, cause hey maybe the incoming TCP stream can die
or something, or the player quits.</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> Player <span style="color: #0000ff">interface</span> {
NextScore() (score <span style="color: #2b91af">int</span>, err <span style="color: #2b91af">error</span>)
}
</pre></div>
</p>
<p>To handle the player, we&rsquo;ll assume all errors are fatal and pass received
scores down the channel.</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">func</span> (g *Game) HandlePlayer(p Player) <span style="color: #2b91af">error</span> {
<span style="color: #0000ff">for</span> {
score, err := p.NextScore()
<span style="color: #0000ff">if</span> err != <span style="color: #0000ff">nil</span> {
<span style="color: #0000ff">return</span> err
}
g.scores &lt;- score
}
}
</pre></div>
</p>
<p>Yay! Okay, we have a <code>Game</code> type that can keep track of the highest score a
<code>Player</code> receives in a thread-safe way.</p>
<p>You wrap up your development and you&rsquo;re on your way to having customers. You
make this game server public and you&rsquo;re incredibly successful! Lots of games
are being created with your game server.</p>
<p>Soon, you discover people sometimes leave your game. Lots of games no longer
have any players playing, but nothing stopped the game loop. You are getting
overwhelmed by dead <code>(*Game).run</code> goroutines.</p>
<p><strong>Challenge:</strong> fix the goroutine leak above without mutexes or panics.
For real, scroll up to the above code and come up with a plan for fixing this
problem using just channels.</p>
<p><br/>
<br/>
I&rsquo;ll wait.
<br/>
<br/>
<br/>
<br/></p>
<p>For what it&rsquo;s worth, it totally can be done with channels only, but observe the
simplicity of the following solution which doesn&rsquo;t even have this problem:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> Game <span style="color: #0000ff">struct</span> {
mtx sync.Mutex
bestScore <span style="color: #2b91af">int</span>
}
<span style="color: #0000ff">func</span> NewGame() *Game {
<span style="color: #0000ff">return</span> &amp;Game{}
}
<span style="color: #0000ff">func</span> (g *Game) HandlePlayer(p Player) <span style="color: #2b91af">error</span> {
<span style="color: #0000ff">for</span> {
score, err := p.NextScore()
<span style="color: #0000ff">if</span> err != <span style="color: #0000ff">nil</span> {
<span style="color: #0000ff">return</span> err
}
g.mtx.Lock()
<span style="color: #0000ff">if</span> g.bestScore &lt; score {
g.bestScore = score
}
g.mtx.Unlock()
}
}
</pre></div>
</p>
<p>Which one would you rather work on? Don&rsquo;t be deceived into thinking that the
channel solution somehow makes this more readable and understandable in more
complex cases. Teardown is very hard. This sort of teardown is just a piece of
cake with a mutex, but the hardest thing to work out with Go-specific channels
only. Also, if anyone replies that channels sending channels is easier to
reason about here it will cause me an immediate head-to-desk motion.</p>
<p>Importantly, this particular case might actually be <em>easily</em> solved <em>with
channels</em> with some runtime assistance Go doesn&rsquo;t provide! Unfortunately, as
it stands, there are simply a surprising amount of problems that are solved
better with traditional synchronization primitives than with Go&rsquo;s version of
CSP. We&rsquo;ll talk about what Go could have done to make this case easier later.</p>
<p><strong>Exercise:</strong> Still skeptical? Try making both solutions above (channel-only
vs. mutex-only) stop asking for scores from <code>Players</code> once <code>bestScore</code> is 100
or greater. Go ahead and open your text editor. This is a small, toy problem.</p>
<p>The summary here is that you will be using traditional synchronization
primitives in addition to channels if you want to do anything real.</p>
<h3 id="channels-are-slower-than-implementing-it-yourself:7de476e370ca6a780a51fd680c5a51dd">Channels are slower than implementing it yourself</h3>
<p>One of the things I assumed about Go being so heavily based in CSP theory is
that there should be some pretty killer scheduler optimizations the runtime
can make with channels. Perhaps channels aren&rsquo;t always the most straightforward
primitive, but surely they&rsquo;re efficient and fast, right?</p>
<p><div class="float-right">
<p><figure>
<img src="https://www.jtolio.com/images/wat/jon-stewart.jpg" alt="John Stewart" onmouseover="this.src='\/images\/wat\/jon-stewart.gif';" onclick="this.src='\/images\/wat\/jon-stewart.gif'; this.onmouseout=null;" onmouseout="this.src='\/images\/wat\/jon-stewart.jpg';" /></p>
<p></figure></p>
</div>
</p>
<p>As <a href="https://twitter.com/HiattDustin">Dustin Hiatt</a> points out on
<a href="http://bravenewgeek.com/go-is-unapologetically-flawed-heres-why-we-use-it/">Tyler Treat&rsquo;s post about Go</a>,</p>
<blockquote>
<p>Behind the scenes, channels are using locks to serialize access and provide
threadsafety. So by using channels to synchronize access to memory, you are,
in fact, using locks; locks wrapped in a threadsafe queue. So how do Go’s
fancy locks compare to just using mutex’s from their standard library <code>sync</code>
package? The following numbers were obtained by using Go’s builtin
benchmarking functionality to serially call Put on a single set of their
respective types.</p>
<pre><code>BenchmarkSimpleSet-8 3000000 391 ns/op
BenchmarkSimpleChannelSet-8 1000000 1699 ns/o
</code></pre>
</blockquote>
<p>It&rsquo;s a similar story with unbuffered channels, or even the same test under
contention instead of run serially.</p>
<p>Perhaps the Go scheduler will improve, but in the meantime, good old mutexes
and condition variables are very good, efficient, and fast. If you want
performance, you use the tried and true methods.</p>
<p><div class="clear-both"></div>
</p>
<h3 id="channels-don-t-compose-well-with-other-concurrency-primitives:7de476e370ca6a780a51fd680c5a51dd">Channels don&rsquo;t compose well with other concurrency primitives</h3>
<p>Alright, so hopefully I have convinced you that you&rsquo;ll at least be interacting
with primitives besides channels sometimes. The standard library certainly
seems to prefer traditional synchronization primitives over channels.</p>
<p>Well guess what, it&rsquo;s actually somewhat challenging to use channels alongside
mutexes and condition variables correctly!</p>
<p>One of the interesting things about channels that makes a lot of sense coming
from CSP is that channel sends are synchronous. A channel send and channel
receive are intended to be synchronization barriers, and the send and receive
should happen at the same virtual time. That&rsquo;s wonderful if you&rsquo;re in
well-executed CSP-land.</p>
<p><div class="float-left">
<p><figure>
<img src="https://www.jtolio.com/images/wat/obama.jpg" alt="Barack Obama" onmouseover="this.src='\/images\/wat\/obama.gif';" onclick="this.src='\/images\/wat\/obama.gif'; this.onmouseout=null;" onmouseout="this.src='\/images\/wat\/obama.jpg';" /></p>
<p></figure></p>
</div>
</p>
<p>Pragmatically, Go channels also come in a buffered variety. You can allocate a
fixed amount of space to account for possible buffering so that sends and
receives are disparate events, but the buffer size is capped. Go doesn&rsquo;t
provide a way to have arbitrarily sized buffers - you have to allocate the
buffer size in advance. <em>This is fine</em>, I&rsquo;ve seen people argue on the mailing
list, <em>because memory is bounded anyway.</em></p>
<p>Wat.</p>
<p>This is a bad answer. There&rsquo;s all sorts of reasons to use an arbitrarily
buffered channel. If we knew everything up front, why even have <code>malloc</code>?</p>
<p><div class="clear-both"></div>
</p>
<p>Not having arbitrarily buffered channels means that a naive send on <em>any</em>
channel could block at any time. You want to send on a channel and update some
other bookkeeping under a mutex? Careful! Your channel send might block!</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span> <span style="color: #008000">// ...</span>
s.mtx.Lock()
<span style="color: #008000">// ...</span>
s.ch &lt;- val <span style="color: #008000">// might block!</span>
s.mtx.Unlock()
<span style="color: #008000">// ...</span>
</pre></div>
</p>
<p>This is a recipe for dining philosopher dinner fights. If you take a lock, you
should quickly update state and release it and not do anything blocking under
the lock if possible.</p>
<p>There is a way to do a non-blocking send on a channel in Go, but it&rsquo;s not the
default behavior. Assume we have a channel <code>ch := make(chan int)</code> and we want
to send the value <code>1</code> on it without blocking. Here is the minimum amount of
typing you have to do to send without blocking:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span> <span style="color: #0000ff">select</span> {
<span style="color: #0000ff">case</span> ch &lt;- 1: <span style="color: #008000">// it sent</span>
<span style="color: #0000ff">default</span>: <span style="color: #008000">// it didn&#39;t</span>
}
</pre></div>
</p>
<p>This isn&rsquo;t what naturally leaps to mind for beginning Go programmers.</p>
<p>The summary is that because many operations on channels block, it takes careful
reasoning about philosophers and their dining to successfully use channel
operations alongside and under mutex protection, without causing deadlocks.</p>
<h3 id="callbacks-are-strictly-more-powerful-and-don-t-require-unnecessary-goroutines:7de476e370ca6a780a51fd680c5a51dd">Callbacks are strictly more powerful and don&rsquo;t require unnecessary goroutines.</h3>
<p><div class="float-right">
<p><figure>
<img src="https://www.jtolio.com/images/wat/yael-grobglas.jpg" alt="Yael Grobglas" onmouseover="this.src='\/images\/wat\/yael-grobglas.gif';" onclick="this.src='\/images\/wat\/yael-grobglas.gif'; this.onmouseout=null;" onmouseout="this.src='\/images\/wat\/yael-grobglas.jpg';" /></p>
<p></figure></p>
</div>
</p>
<p>Whenever an API uses a channel, or whenever I point out that a channel makes
something hard, someone invariably points out that I should just spin up
a goroutine to read off the channel and make whatever translation or fix I need
as it reads of the channel.</p>
<p>Um, no. What if my code is in a hotpath? There&rsquo;s very few instances that
require a channel, and if your API could have been designed with mutexes,
semaphores, and callbacks and no additional goroutines (because all event edges
are triggered by API events), then using a channel forces me to add another
stack of memory allocation to my resource usage. Goroutines are much lighter
weight than threads, yes, but lighter weight doesn&rsquo;t mean the lightest weight
possible.</p>
<p>As I&rsquo;ve formerly <a href="http://www.informit.com/articles/article.aspx?p=2359758#comment-2061767464">argued in the comments on an article about using channels</a> (lol the internet),
your API can <em>always</em> be more general, <em>always</em> more flexible, and take
drastically less resources if you use callbacks instead of channels.
&ldquo;Always&rdquo; is a scary word, but I mean it here. There&rsquo;s proof-level stuff going
on.</p>
<p>If someone provides a callback-based API to you and you need a channel, you can
provide a callback that sends on a channel with little overhead and full
flexibility.</p>
<p>If, on the other hand, someone provides a channel-based API to you and you need
a callback, you have to spin up a goroutine to read off the channel <em>and</em> you
have to hope that no one tries to send more on the channel when you&rsquo;re done
reading so you cause blocked goroutine leaks.</p>
<p>For a super simple real-world example, check out the
<a href="https://godoc.org/golang.org/x/net/context">context interface</a> (which
incidentally is an incredibly useful package and what you should be using
instead of <a href="https://github.com/jtolds/gls">goroutine-local storage</a>):</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> Context <span style="color: #0000ff">interface</span> {
...
<span style="color: #008000">// Done returns a channel that closes when this work unit should be canceled.</span>
Done() &lt;-<span style="color: #0000ff">chan</span> <span style="color: #0000ff">struct</span>{}
<span style="color: #008000">// Err returns a non-nil error when the Done channel is closed</span>
Err() <span style="color: #2b91af">error</span>
...
}
</pre></div>
</p>
<p>Imagine all you want to do is log the corresponding error when the <code>Done()</code>
channel fires. What do you have to do? If you don&rsquo;t have a good place you&rsquo;re
already selecting on a channel, you have to spin up a goroutine to deal with
it:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span> <span style="color: #0000ff">go</span> <span style="color: #0000ff">func</span>() {
&lt;-ctx.Done()
logger.Errorf(<span style="color: #a31515">&quot;canceled: %v&quot;</span>, ctx.Err())
}()
</pre></div>
</p>
<p>What if <code>ctx</code> gets garbage collected without closing the channel <code>Done()</code>
returned? Whoops! Just leaked a goroutine!</p>
<p>Now imagine we changed <code>Done</code>&rsquo;s signature:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span> <span style="color: #008000">// Done calls cb when this work unit should be canceled.</span>
Done(cb <span style="color: #0000ff">func</span>())
</pre></div>
</p>
<p>First off, logging is so easy now. Check it out:
<code>ctx.Done(func() { log.Errorf(&quot;canceled: %v&quot;, ctx.Err()) })</code>.
But lets say you really do need some select behavior. You can just call it like
this:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span> ch := make(<span style="color: #0000ff">chan</span> <span style="color: #0000ff">struct</span>{})
ctx.Done(<span style="color: #0000ff">func</span>() { close(ch) })
</pre></div>
</p>
<p>Voila! No expressiveness lost by using a callback instead. <code>ch</code> works like
the channel <code>Done()</code> used to return, and in the logging case we didn&rsquo;t need to
spin up a whole new stack. I got to keep my stack traces (if our log package
is inclined to use them); I got to avoid another stack allocation and another
goroutine to give to the scheduler.</p>
<p>Next time you use a channel, ask yourself if there&rsquo;s some goroutines you could
eliminate if you used mutexes and condition variables instead. If the answer is
yes, your code will be more efficient if you change it. And if you&rsquo;re trying to
use channels just to be able to use the <code>range</code> keyword over a collection, I&rsquo;m
going to have to ask you to put your keyboard away or just go back to writing
Python books.</p>
<p><div class="float-left">
<p><figure>
<img src="https://www.jtolio.com/images/wat/zooey-deschanel.jpg" alt="Zooey Deschanel" onmouseover="this.src='\/images\/wat\/zooey-deschanel.gif';" onclick="this.src='\/images\/wat\/zooey-deschanel.gif'; this.onmouseout=null;" onmouseout="this.src='\/images\/wat\/zooey-deschanel.jpg';" /></p>
<p><figcaption>
<p>more like Zooey De-channel, amirite</p>
</figcaption></p>
<p></figure></p>
</div>
</p>
<h3 id="the-channel-api-is-inconsistent-and-just-cray-cray:7de476e370ca6a780a51fd680c5a51dd">The channel API is inconsistent and just cray-cray</h3>
<p>Closing or sending on a closed channel panics! Why? If you want to close a
channel, you need to either synchronize its closed state externally (with
mutexes and so forth that don&rsquo;t compose well!) so that other writers don&rsquo;t
write to or close a closed channel, or just charge forward and close or write
to closed channels and expect you&rsquo;ll have to recover any raised panics.</p>
<p>This is such bizarre behavior. Almost every other operation in Go has a way to
avoid a panic (type assertions have the <code>, ok =</code> pattern, for example), but
with channels you just get to deal with it.</p>
<p>Okay, so when a send will fail, channels panic. I guess that makes some kind
of sense. But unlike almost everything else with nil values, sending to a nil
channel won&rsquo;t panic. Instead, it will block forever! That&rsquo;s pretty
counter-intuitive. That might be useful behavior, just like having a can-opener
attached to your weed-whacker might be useful (and found in Skymall), but it&rsquo;s
certainly unexpected. Unlike interacting with nil maps (which do implicit
pointer dereferences), nil interfaces (implicit pointer dereferences),
unchecked type assertions, and all sorts of other things, nil channels exhibit
actual channel behavior, as if a brand new channel was just instantiated for
this operation.</p>
<p>Receives are slightly nicer. What happens when you receive on a closed channel?
Well, that works - you get a zero value. Okay that makes sense I guess. Bonus!
Receives allow you to do a <code>, ok =</code>-style check if the channel was open when
you received your value. Thank heavens we get <code>, ok =</code> here.</p>
<p>But what happens if you receive from a nil channel? <em>Also blocks forever!</em> Yay!
Don&rsquo;t try and use the fact that your channel is nil to keep track of if you
closed it!</p>
<p><div class="clear-both"></div>
</p>
<h2 id="what-are-channels-good-for:7de476e370ca6a780a51fd680c5a51dd">What are channels good for?</h2>
<p>Of course channels are good for some things (they are a generic container
after all), and there are certain things you can only do with them (<code>select</code>).</p>
<h3 id="they-are-another-special-cased-generic-datastructure:7de476e370ca6a780a51fd680c5a51dd">They are another special-cased generic datastructure</h3>
<p>Go programmers are so used to arguments about generics that I can feel the PTSD
coming on just by bringing up the word. I&rsquo;m not here to talk about it so wipe
the sweat off your brow and let&rsquo;s keep moving.</p>
<p>Whatever your opinion of generics is, Go&rsquo;s maps, slices, and channels are data
structures that support generic element types, because they&rsquo;ve been
special-cased into the language.</p>
<p>In a language that doesn&rsquo;t allow you to write your own generic containers,
<em>anything</em> that allows you to better manage collections of things is valuable.
Here, channels are a thread-safe datastructure that supports arbitrary value
types.</p>
<p>So that&rsquo;s useful! That can save some boilerplate I suppose.</p>
<p>I&rsquo;m having trouble counting this as a win for channels.</p>
<h3 id="select:7de476e370ca6a780a51fd680c5a51dd">Select</h3>
<p>The main thing you can do with channels is the <code>select</code> statement. Here you
can wait on a fixed number of inputs for events. It&rsquo;s kind of like epoll, but
you have to know upfront how many sockets you&rsquo;re going to be waiting on.</p>
<p>This is truly a useful language feature. Channels would be a complete wash if
not for <code>select</code>. But holy smokes, let me tell you about the first time you
decide you might need to select on multiple things but you don&rsquo;t know how many
and you have to use <code>reflect.Select</code>.</p>
<h2 id="how-could-channels-be-better:7de476e370ca6a780a51fd680c5a51dd">How could channels be better?</h2>
<p>It&rsquo;s really tough to say what the most tactical thing the Go language team
could do for Go 2.0 is (the Go 1.0 compatibility guarantee is good but
hand-tying), but that won&rsquo;t stop me from making some suggestions.</p>
<h3 id="select-on-condition-variables:7de476e370ca6a780a51fd680c5a51dd">Select on condition variables!</h3>
<p>We could just obviate the need for channels! This is where I propose we get
rid of some sacred cows, but let me ask you this, how great would it be if you
could select on any custom synchronization primitive? (A: So great.) If we had
that, we wouldn&rsquo;t need channels at all.</p>
<h3 id="gc-could-help-us:7de476e370ca6a780a51fd680c5a51dd">GC could help us?</h3>
<p>In the very first example, we could easily solve the high score server cleanup
with channels if we were able to use directionally-typed channel garbage
collection to help us clean up.</p>
<p><div class="float-right">
<p><figure>
<img src="https://www.jtolio.com/images/wat/joel-mchale.jpg" alt="Joel McHale" onmouseover="this.src='\/images\/wat\/joel-mchale.gif';" onclick="this.src='\/images\/wat\/joel-mchale.gif'; this.onmouseout=null;" onmouseout="this.src='\/images\/wat\/joel-mchale.jpg';" /></p>
<p></figure></p>
</div>
</p>
<p>As you know, Go has directionally-typed channels. You can have a channel type
that only supports reading (<code>&lt;-chan</code>) and a channel type that only supports
writing (<code>chan&lt;-</code>). Great!</p>
<p>Go also has garbage collection. It&rsquo;s clear that certain kinds of book keeping
are just too onerous and we shouldn&rsquo;t make the programmer deal with
them. We clean up unused memory! Garbage collection is useful and neat.</p>
<p>So why not help clean up unused or deadlocked channel reads? Instead of having
<code>make(chan Whatever)</code> return one bidirectional channel, have it return two
single-direction channels (<code>chanReader, chanWriter := make(chan Type)</code>).</p>
<p><div class="clear-both"></div>
</p>
<p>Let&rsquo;s reconsider the original example:</p>
<p><div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span></span><span style="color: #0000ff">type</span> Game <span style="color: #0000ff">struct</span> {
bestScore <span style="color: #2b91af">int</span>
scores <span style="color: #0000ff">chan</span>&lt;- <span style="color: #2b91af">int</span>
}
<span style="color: #0000ff">func</span> run(bestScore *<span style="color: #2b91af">int</span>, scores &lt;-<span style="color: #0000ff">chan</span> <span style="color: #2b91af">int</span>) {
<span style="color: #008000">// we don&#39;t keep a reference to a *Game directly because then we&#39;d be holding</span>
<span style="color: #008000">// onto the send side of the channel.</span>
<span style="color: #0000ff">for</span> score := <span style="color: #0000ff">range</span> scores {
<span style="color: #0000ff">if</span> *bestScore &lt; score {
*bestScore = score
}
}
}
<span style="color: #0000ff">func</span> NewGame() (g *Game) {
<span style="color: #008000">// this make(chan) return style is a proposal!</span>
scoreReader, scoreWriter := make(<span style="color: #0000ff">chan</span> <span style="color: #2b91af">int</span>)
g = &amp;Game{
bestScore: 0,
scores: scoreWriter,
}
<span style="color: #0000ff">go</span> run(&amp;g.bestScore, scoreReader)
<span style="color: #0000ff">return</span> g
}
<span style="color: #0000ff">func</span> (g *Game) HandlePlayer(p Player) <span style="color: #2b91af">error</span> {
<span style="color: #0000ff">for</span> {
score, err := p.NextScore()
<span style="color: #0000ff">if</span> err != <span style="color: #0000ff">nil</span> {
<span style="color: #0000ff">return</span> err
}
g.scores &lt;- score
}
}
</pre></div>
</p>
<p>If garbage collection closed a channel when we could prove no more values are
ever coming down it, this solution is completely fixed. Yes yes, the comment
in <code>run</code> is indicative of the existence of a rather large gun aimed at your
foot, but at least the problem is easily solveable now, whereas it really
wasn&rsquo;t before. Furthermore, a smart compiler could probably make appropriate
proofs to reduce the damage from said foot-gun.</p>
<p><div class="clear-both"></div>
</p>
<h3 id="other-smaller-issues:7de476e370ca6a780a51fd680c5a51dd">Other smaller issues</h3>
<ul>
<li><strong>Dup channels?</strong> - If we could use an equivalent of the <code>dup</code> syscall on
channels, then we could also solve the multiple producer problem quite
easily. Each producer could close their own <code>dup</code>-ed channel without ruining
the other producers.</li>
<li><strong>Fix the channel API!</strong> - Close isn&rsquo;t idempotent? Send on closed
channel panics with no way to avoid it? Ugh!</li>
<li><strong>Arbitrarily buffered channels</strong> - If we could make buffered channels with
no fixed buffer size limit, then we could make channels that don&rsquo;t block.</li>
</ul>
<h2 id="what-do-we-tell-people-about-go-then:7de476e370ca6a780a51fd680c5a51dd">What do we tell people about Go then?</h2>
<p>If you haven&rsquo;t yet, please go take a look at my current favorite programming
post: <a href="http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">What Color is Your Function</a>. Without being about Go specifically, this blog post much more
eloquently than I could lays out exactly why goroutines are Go&rsquo;s best feature
(and incidentally one of the ways Go is better than Rust for some
applications).</p>
<p>If you&rsquo;re still writing code in a programming language that forces keywords
like <code>yield</code> on you to get high performance, concurrency, or an event-driven
model, you are living in the past, whether or not you or anyone else knows it.
Go is so far one of the best entrants I&rsquo;ve seen of languages that implement an
M:N threading model that&rsquo;s not 1:1, and dang that&rsquo;s powerful.</p>
<p>So, tell folks about goroutines.</p>
<p>If I had to pick one other leading feature of Go, it&rsquo;s interfaces.
Statically-typed <a href="https://en.wikipedia.org/wiki/Duck_typing">duck typing</a> makes
extending and working with your own or someone else&rsquo;s project so fun and
amazing it&rsquo;s probably worth me writing an entirely different set of words about
it some other time.</p>
<h2 id="so:7de476e370ca6a780a51fd680c5a51dd">So&hellip;</h2>
<p>I keep seeing people charge in to Go, eager to use channels to their full
potential. Here&rsquo;s my advice to you.</p>
<p><strong>JUST STAHP IT</strong></p>
<p>When you&rsquo;re writing APIs and interfaces, as bad as the advice &ldquo;never&rdquo; can be,
I&rsquo;m pretty sure there&rsquo;s never a time where channels are better, and every Go
API I&rsquo;ve used that used channels I&rsquo;ve ended up having to fight. I&rsquo;ve never
thought &ldquo;oh good, there&rsquo;s a channel here;&rdquo; it&rsquo;s always instead been some
variant of <em><strong>WHAT FRESH HELL IS THIS?</strong></em></p>
<p>So, <em>please, please use channels where appropriate and only
where appropriate.</em></p>
<p>In all of my Go code I work with, I can count on one hand the number of times
channels were really the best choice. Sometimes they are. That&rsquo;s great! Use
them then. But otherwise just stop.</p>
<p><div class="float-center">
<p><figure>
<img src="https://www.jtolio.com/images/wat/michael-cera.jpg" alt="Michael Cera" onmouseover="this.src='\/images\/wat\/michael-cera.gif';" onclick="this.src='\/images\/wat\/michael-cera.gif'; this.onmouseout=null;" onmouseout="this.src='\/images\/wat\/michael-cera.jpg';" /></p>
<p></figure></p>
</div>
</p>
<p><em>Special thanks for the valuable feedback provided by my proof readers
Jeff Wendling, <a href="https://github.com/azdagron">Andrew Harding</a>,
<a href="https://twitter.com/taterbase">George Shank</a>, and
<a href="http://bravenewgeek.com">Tyler Treat</a>.</em></p>
<p>If you want to work on Go with us at Space Monkey, please
<a href="https://www.jtolio.com/contact/">hit me up</a>!</p>
Research log: gene signatures and connectivity maphttps://www.jtolio.com/2015/11/research-log-gene-signatures-and-connectivity-map
Thu, 26 Nov 2015 10:36:00 -0700hello@jtolio.com (JT Olio)https://www.jtolio.com/2015/11/research-log-gene-signatures-and-connectivity-map
<p>Happy Thanksgiving everyone!</p>
<h2 id="context:b64928754d3b76d12514858ca6ca1286">Context</h2>
<p>This is the third post in my continuing series on my attempts at research.
Previously we talked about:</p>
<ul>
<li><a href="https://www.jtolio.com/writing/2015/11/research-log-cell-states-and-microarrays/">what I&rsquo;m doing, cell states, and microarrays</a></li>
<li>and then <a href="https://www.jtolio.com/writing/2015/11/research-log-r-and-more-microarrays/">more about microarrays and R</a>.</li>
</ul>
<p>By the end of last week we had discussed how to get a table of normalized
gene expression intensities that looks like this:</p>
<pre><code>ENSG00000280099_at 0.15484421
ENSG00000280109_at 0.16881395
ENSG00000280178_at -0.19621641
ENSG00000280316_at 0.08622216
ENSG00000280401_at 0.15966256
ENSG00000281205_at -0.02085352
...
</code></pre>
<p>The reason for doing this is to figure out which genes are related, and
perhaps more importantly, what a cell is even doing.</p>
<p><em>Summary:</em> new post, also, I&rsquo;m bringing back the short section summaries.</p>
<h2 id="cell-lines:b64928754d3b76d12514858ca6ca1286">Cell lines</h2>
<p>The first thing to do when trying to figure out what cells are doing is
to choose a cell. There&rsquo;s all sorts of cells. Healthy brain cells,
cancerous blood cells, bruised skin cells, etc.</p>
<p>For any experiment, you&rsquo;ll need a control to eliminate noise and
apply statistical tests for validity. If you don&rsquo;t use a control, the
effect you&rsquo;re seeing may not even exist, and so for any experiment
with cells, you will need a control cell.</p>
<p>Cells often divide, which means that a cell, once chosen,
will duplicate itself for you in the presence of the appropriate
resources. Not all cells divide ad nauseam which provides some
challenges, but many cells under study luckily do.</p>
<p>So, a <em>cell line</em> is simply a set of cells that have all replicated
from a specific chosen initial cell. Any set of cells from a cell
line will be as identical as possible (unless you screwed up! geez).
They will be the same type of cell with the same traits and behaviors,
at least, as much as possible.</p>
<p><em>Summary:</em> a cell line is a large amount of cells that are as close
to being the same as possible.</p>
<h2 id="perturbagens:b64928754d3b76d12514858ca6ca1286">Perturbagens</h2>
<p>There are many things that might affect what a cell is doing. Drugs,
agitation, temperature, disease, cancer, gene splicing, small molecules
(maybe you give a cell more iron or calcium or something), hormones,
light, Jello, ennui, etc. Given any particular cell line, giving a
cell from that cell line one of these <em>perturbagens</em>, or, perturbing
the cell in a specific way, when compared to a control will say what
that cell does differently in the face of that perturbagen.</p>
<p>If you&rsquo;d like to find out what exactly a certain type of cell does
when you give it lemon lime soda, then you choose the right cell line,
leave out some control cells and give the rest of the cells soda.</p>
<p>Then, you measure gene expression intensities for both the control
cells and the perturbed cells. The <em>differential expression</em> of
genes between the perturbed cells and the controls cells is likely
due to the introduction of the lemon lime soda.</p>
<p>Genes that end up getting expressed <em>more</em> in the presence of the
soda are considered <em>up-regulated</em>, whereas genes that end up
getting expressed <em>less</em> are considered <em>down-regulated</em>. The
degree to which a gene is up or down regulated constitutes how
much of an effect the soda may have had on that gene.</p>
<p>Of course, all of this has such a significant amount of
experimental noise that you could find pretty much anything.
You&rsquo;ll need to replicate your experiment independently a few times
before you publish that lemon lime soda causes increased
expression in the <a href="https://en.wikipedia.org/wiki/Sonic_hedgehog">Sonic hedgehog gene</a>.</p>
<p><em>Summary:</em> A perturbagen is something you introduce/do to a cell to
change its behavior, such as drugs or throwing it at a wall or
something. The wall perturbagen.</p>
<h2 id="gene-signature:b64928754d3b76d12514858ca6ca1286">Gene signature</h2>
<p>For a given change or perturbagen to a cell, we now have enough
to compute lists of up-regulated and down-regulated genes and the
magnitude change in expression for each gene.</p>
<p>This gene expression pattern for some subset of important genes
(perhaps the most changed in expression) is called a
<em>gene signature</em>, and gene signatures are very useful. By
comparing signatures, you can:</p>
<ul>
<li>identify or compare cell states</li>
<li>find sets of positively or negatively correlated genes</li>
<li>find similar disease signatures</li>
<li>find similar drug signatures</li>
<li>find drug signatures that might counteract opposite disease
signatures.</li>
</ul>
<p>(That last bullet point is essentially where I&rsquo;m headed with
my research.)</p>
<p><em>Summary:</em> a gene signature is a short summary of the most
important gene expression differences a perturbagen causes in a
cell.</p>
<h2 id="drugs:b64928754d3b76d12514858ca6ca1286">Drugs!</h2>
<p>The pharmaceutical industry is constantly on the lookout for
new breakthrough drugs that might represent huge windfalls in
cash, and drugs don&rsquo;t always work as planned. Many drugs spend
years in research and development, only to ultimately find
poor efficacy or adoption. Sometimes drugs even become known
<a href="https://en.wikipedia.org/wiki/Sildenafil#History">much more for their side-effects than their originally intended
therapy</a>.</p>
<p>The practical upshot is that there&rsquo;s countless FDA-approved
drugs that represent decades of work that are simply underused
or even unused entirely. These drugs have already cleared many
challenging regulatory hurdles, but are simply and quite
literally cures looking for a disease.</p>
<p>If even just one of these drugs can be given a new lease
on life for some yet-to-be-cured disease, then perhaps we can
give some people new leases on life!</p>
<p><em>Summary:</em> instead of developing new drugs, there&rsquo;s already
lots of drugs that aren&rsquo;t being used. Maybe we can find matching
diseases!</p>
<h2 id="the-connectivity-map-project:b64928754d3b76d12514858ca6ca1286">The Connectivity Map project</h2>
<p>The <a href="https://www.broadinstitute.org/cmap/">Broad Institute&rsquo;s Connectivity Map project</a> isn&rsquo;t particularly new
anymore, but it represents a ground breaking and promising idea -
we can dump a bunch of signatures into a database and construct
all sorts of new hypotheses we might not even have thought to
check before.</p>
<p>To prove out the usefulness of this idea, the Connectivity Map
(or cmap) project chose 5 different cell lines (all cancer cells,
which are easy to get to replicate!) and a library of FDA approved
drugs, and then gave some cells these drugs.</p>
<p>They then constructed a database of all of the signatures they
computed for each possible perturbagen they measured.
Finally, they constructed a web interface where a user can
upload a gene signature and get a result list back of all of
the signatures they collected, ordered by the most to least
similar. You can totally go sign up and
<a href="https://www.broadinstitute.org/cmap/">try it out</a>.</p>
<p>This simple tool is surprisingly powerful. It allows you to
find similar drugs to a drug you know, but it also allows
you to find drugs that might counteract a disease you&rsquo;ve
created a signature for.</p>
<p>Ultimately, the project led to <a href="https://www.broadinstitute.org/cmap/publications.jsp">a number of successful
applications</a>.
So useful was it that the Broad Institute has doubled down
and created the much larger and more comprehensive <a href="http://www.lincscloud.org/">LINCS
Project</a> that targets an order
of magnitude more cell lines (77) and more perturbagens
(42,532, compared to cmap&rsquo;s 6100). You can sign up and use
that one too!</p>
<p><em>Summary</em>: building a system that supports querying
signature connections has already proved to be super useful.</p>
<h2 id="whew:b64928754d3b76d12514858ca6ca1286">Whew</h2>
<p>Alright, I wrote most of this on a plane yesterday but since
I should now be spending time with family I&rsquo;m going to cut
it short here.</p>
<p>Stay tuned for next week!</p>
Research log: R and more microarrayshttps://www.jtolio.com/2015/11/research-log-r-and-more-microarrays
Thu, 19 Nov 2015 18:33:00 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2015/11/research-log-r-and-more-microarrays
<h2 id="context:e512c96984c68e012cdc5b3a25d9c0d9">Context</h2>
<p>Welcome back! If you&rsquo;re just checking in for the first time, this is post two in
my attempts to start blogging about my attempts at research. <a href="https://www.jtolio.com/writing/2015/11/research-log-cell-states-and-microarrays/">Go read the first
one to catch up!</a></p>
<h2 id="r:e512c96984c68e012cdc5b3a25d9c0d9">R</h2>
<p>Before we get started, I just need to say something to those of you that might
primarily be software developers. We&rsquo;re going to be dealing with a lot of
data. Stop fighting. Give up. Just use R.</p>
<p>The <a href="https://www.r-project.org/">R programming language</a> is the GNU
implementation of an older language for statistics called
<a href="https://www.jtolio.com/images/the-s-is-for-sucks.png">S</a>. I guess the way I&rsquo;d describe R is that
it&rsquo;s not the sort of language whose history is filled with emphasis on
software engineering. That&rsquo;s actually kind of a compliment in some ways
because it certainly doesn&rsquo;t have the documentation standards (or lack
thereof) of normal software projects. R has an entire built-in citation
system so your publications can properly cite the rich documentation it
has. That&rsquo;s the sort of commitment to documentation everyone else needs!</p>
<p>R (and S before it) has a strong emphasis on statistical support, which is
certainly nice and convenient when you want to do stats. Though
the language around the stats features is, uh, quirky,
scientists who have needed statistical support have flocked to it, and at
this point, a dazzling array of niche packages of sprung up supporting
all sorts of things.</p>
<p>The most inspiring ecosystem inside of R is probably
<a href="http://bioconductor.org/">Bioconductor</a>, which is essentially a custom
package manager for everything biological you might need a computer for.
I&rsquo;m not totally sure how to do much new biology with other data science
biologists without Bioconductor. I mean, I&rsquo;m sure it happens, but the
network effects here are strong.</p>
<p>You may be tempted to try and
<a href="http://rpy.sourceforge.net/">use R from a better language</a>, but if your
goal is to avoid R then this is really a dead end. You <em>can</em> use R from
another language but it doesn&rsquo;t save you from having to learn R. As long
as Bioconductor is as dominant as it currently is, you need to understand
and know R.</p>
<p>My only other advice is that you should use R 3.2 or newer and always use
<code>https</code> in your <code>source()</code> calls, despite what all the documentation out
there says. R 3.1 doesn&rsquo;t really support <code>https</code>, and
basically you can&rsquo;t do anything with Bioconductor without indiscriminately
running source code straight off the internet. Recent Bioconductor releases
will actually even prefer <code>https</code>if it&rsquo;s supported. Score!</p>
<p>Honestly you should probably buy a separate computer for running R and
keep it quarantined. Actually, you should quarantine all your computers.
In fact, we need to give up on all of this and move into caves.</p>
<p>If you want to follow along, install and start R and run the following
installation steps:</p>
<pre><code>&gt; source(&quot;https://bioconductor.org/biocLite.R&quot;)
&gt; biocLite(&quot;affy&quot;)
&gt; biocLite(&quot;SCAN.UPC&quot;)
</code></pre>
<h2 id="microarrays-part-2:e512c96984c68e012cdc5b3a25d9c0d9">Microarrays part 2</h2>
<p>Last week, our discussion on microarrays touched largely on how it is that
microarray technology can allow us to measure RNA expression levels, and,
by extension, gene expression levels. Each well in a microarray has some
specific sequence probe bonded (or <em>hybridized</em>) to the bottom, and then
we measure the amount of a given RNA sequence in a sample by checking how
much RNA binds to the hybridized probes. The intensity of our measurement
says how much of that probe&rsquo;s RNA sequence was in the sample.</p>
<p>Except that&rsquo;s not the full story, and my discoveries here are the cause
of the tardiness of this post. It turns out there&rsquo;s a full layer of
complexity I completely missed the first time around, and I&rsquo;ve spent the
last few days figuring it out.</p>
<p>Each well does indeed have a specific type of probe (with millions of
copies). Each probe is attempting to measure a specific nucleotide
sequence in the available RNA. But to compensate
for experimental noise, in some microarrays, especially early Affymetrix
brand microarrays, probes come in pairs. In these cases, each probe pair
contains a well with probes that are a perfect match to a specific nucleotide
sequence <em>and</em> a well with probes that are called <em>mismatch</em> probes. The
mismatch probes only differ from the perfect match probes by one
nucleotide in the very center of the sequence. This allows
experimenters to try and eliminate some noise from the signal by
subtracting the intensity of the mismatch probes from the perfect
match probes, assuming that only the RNA in question will have trouble
binding to the mismatch probes but bind fine to the perfect match probes.</p>
<p>Newer Affymetrix arrays don&rsquo;t have mismatch probes because of advances
in other normalization techniques.</p>
<p>Either way, to further deal with the noise, each probe pair is replicated
11-20 times, so for each type of probe, there&rsquo;s 22-40 different wells in
the microarray helping to measure the signal of that probe in as robust
of a way as possible. You can <a href="http://www.vsni.co.uk/software/genstat/htmlhelp/marray/AffymetrixChips.htm">read more about this here</a>.</p>
<p>When a microarray measurement is run, the result is often a
<a href="http://media.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html">CEL file</a>.
I started writing a parser for CEL files, but that was a fool&rsquo;s errand,
especially because I already don&rsquo;t-know-what-I&rsquo;m-doing enough. There are
a number of excellent CEL file parser libraries on Bioconductor, though!</p>
<p>If you&rsquo;re following along, you can go download a sample CEL file from
Affymetrix here:
<a href="http://www.affymetrix.com/support/developer/downloads/DemoData/HG-U133-DemoData.zip">http://www.affymetrix.com/support/developer/downloads/DemoData/HG-U133-DemoData.zip</a></p>
<p>Since the data in a CEL file is essentially a big matrix of numbers
representing the intensities of all the wells in the microarray, we
can use the <code>affy</code> package to first load the CEL file and then visualize it
like this:</p>
<pre><code>&gt; library(affy)
&gt; data &lt;- ReadAffy(&quot;HG-U133A-1-121502.CEL&quot;)
&gt; image(data)
</code></pre>
<p>and you might get an image like this:</p>
<p><div style="text-align: center;">
<figure >
<img src="https://www.jtolio.com/images/bio/microarray.png" />
</figure>
</div>
</p>
<p>This data is a large matrix (in this case, a 712 by 712 matrix well
intensities in the microarray. The image isn&rsquo;t square because I stretched it
purely for better website looks).</p>
<p>Here&rsquo;s the first six probe ids:</p>
<pre><code>&gt; head(unique(probeNames(data)))
[1] &quot;1007_s_at&quot; &quot;1053_at&quot; &quot;117_at&quot; &quot;121_at&quot; &quot;1255_g_at&quot; &quot;1294_at&quot;
</code></pre>
<p>Here&rsquo;s the first six well measurements:</p>
<pre><code>&gt; head(exprs(data))
HG-U133A-1-121502.CEL
1 155.3
2 11016.5
3 153.3
4 11435.8
5 114.5
6 161.3
</code></pre>
<p>Let&rsquo;s also <a href="https://www.youtube.com/watch?v=sIlNIVXpIns">make some graphs</a> about
the distribution of well measurements:</p>
<pre><code>&gt; hist(exprs(data))
</code></pre>
<p><div style="text-align: center;">
<figure >
<img src="https://www.jtolio.com/images/bio/microarray-hist.png" />
</figure>
</div>
</p>
<p>Yep, that&rsquo;s not useful, let&rsquo;s try again.</p>
<pre><code>&gt; hist(log(exprs(data)))
</code></pre>
<p><div style="text-align: center;">
<figure >
<img src="https://www.jtolio.com/images/bio/microarray-loghist.png" />
</figure>
</div>
</p>
<p>Okay, log-intensity seems more useful.</p>
<p>So this all is <a href="https://www.youtube.com/watch?v=Hm3JodBR-vs">pretty neat</a>,
but what we have is a matrix of all sorts of different intensity measurements
when what we want is normalized expression values for each probe id
and, ultimately, each gene.</p>
<h2 id="normalization-first-pass:e512c96984c68e012cdc5b3a25d9c0d9">Normalization, first pass!</h2>
<p><a href="https://www.jtolio.com/writing/2015/11/research-log-cell-states-and-microarrays/">As I mentioned last time</a>, there&rsquo;s
all sorts of reasons for why the collected dataset is full of noise,
including but not limited to:</p>
<ul>
<li>bonding affected by temperature</li>
<li>small molecules or other impurities in the RNA measurement
medium</li>
<li>variations in quality control of the probes, the microarray,
the fluorescent dye, or the sample being measured</li>
<li>tighter binding between C and G base pairs than A and T</li>
<li>man who even knows what else, this isn&rsquo;t digital</li>
<li>in fact holy cow what if your grad student doing the measurement
is mostly thinking about the latest episode of Empire and
isn&rsquo;t paying attention? Whoops you&rsquo;re actually measuring saliva
now.</li>
<li>moving into caves is sounding pretty good, we could grow
mushrooms and stuff come join me it&rsquo;d be sweet.</li>
<li>Mmmm sautéed mushrooms.
<ul>
<li>butter</li>
<li>mushrooms</li>
<li>melt butter in a large skillet where was I?</li>
</ul></li>
</ul>
<p>To deal with all this noise, there&rsquo;s a number of different strategies
used for turning the data into something useful. A lot of smart people
have thought about this way longer than I have and come up with some
really strong strategies. In fact, the default Affymetrix software
that comes with that manufacturer&rsquo;s microarrays takes care of the
first pass of some of this normalization for you.</p>
<p>I don&rsquo;t have a copy of Affymetrix software, but the authors of the
<code>affy</code> library have attempted to duplicate one of their
normalization techniques for mismatch-based microarrays. We&rsquo;ll run
it now:</p>
<pre><code>&gt; expressionset &lt;- mas5(data)
background correction: mas
PM/MM correction : mas
expression values: mas
background correcting...done.
22283 ids to be processed
| |
|####################|
&gt;
</code></pre>
<p>And the first six probe id expressions:</p>
<pre><code>&gt; head(exprs(expressionset))
HG-U133A-1-121502.CEL
1007_s_at 999.06146
1053_at 949.14361
117_at 49.40164
121_at 1016.47743
1255_g_at 28.89920
1294_at 148.71909
&gt;
</code></pre>
<p>Yay! Except we&rsquo;re not going to use this. <code>mas5</code> is fine, but
the following excerpt from the <code>affy</code> documentation is admittedly
worrisome:</p>
<blockquote>
<p>The methods used by this function were implemented based
upon available documentation. In particular a useful
reference is Statistical Algorithms Description Document
by Affymetrix. Our implementation is based on what is
written in the documentation and, as you might appreciate,
there are places where the documentation is less than
clear. This function does not give exactly the same results.
All source code of our implementation is available. You are
free to read it and suggest fixes.</p>
</blockquote>
<p>Oh, don&rsquo;t worry, author of the <code>affy</code> library, I definitely
appreciate that!</p>
<p>Instead of using this technique, researchers often use one
of a number of other techniques. One of the more common
approaches to normalization is what&rsquo;s called Robust
Multi-array Average, or RMA. Essentially, RMA throws out
mismatch probes (which is nice for arrays that don&rsquo;t have
them), and instead turns all of the intensity
values into quantiles. It considers these quantiles
across multiple microarray experiments to better come up
with a statistically valid comparison. This is a pretty
strong approach, but a downside with it is that if you then
want to start comparing yet another microarray dataset,
you will need to re-run your RMA normalization.</p>
<p>Tradeoffs about how to clean your data: The Data Science Story.</p>
<h2 id="probes-to-genes:e512c96984c68e012cdc5b3a25d9c0d9">Probes to genes</h2>
<p>Genes, like pretty much everything else in biology, are variable
on a number of fronts - one of which is length. Some genes are
short and others are long. Microarray probes, on the other hand,
are all the same length.</p>
<p>Because of this, probes are chosen to find genes in such a way that
there might be multiple probes for any given gene. If all the probes
are activated, then the gene is probably there. As a result, in
addition to all this normalization, a further data cleaning step
will usually involve mapping probe ids to gene ids.</p>
<p>Luckily, <a href="http://piccolo.byu.edu/">Dr. Stephen Piccolo</a> (along with
his colleagues including <a href="http://www.bioscience.utah.edu/faculty/molecular-biology-faculty/bild/bild.php">my lab&rsquo;s PI</a>)
has come up with package that will not only read CEL files,
account for mismatch probes if they exist, normalize the data in such
a way that it is comparable to other, yet-unknown CEL files, but also
map it to genes. Yay!</p>
<p>So let&rsquo;s use <a href="https://www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html">SCAN.UPC</a>,
the package in question to load our CEL file. We&rsquo;ll also have to
tell it to download a gene list mapping.</p>
<pre><code>&gt; library(SCAN.UPC)
&gt; pkgname &lt;- InstallBrainArrayPackage(&quot;HG-U133A-1-121502.CEL&quot;, &quot;20.0.0&quot;, &quot;hs&quot;, &quot;ensg&quot;)
&gt; data &lt;- SCAN(&quot;HG-U133A-1-121502.CEL&quot;, probeSummaryPackage=pkgname)
</code></pre>
<p><em>(Note: as of this writing, <code>InstallBrainArrayPackage</code> does not use <code>https</code>!
Hope your computer is quarantined!)</em></p>
<p>Let&rsquo;s check out the data!</p>
<pre><code>&gt; head(exprs(data))
HG-U133A-1-121502.CEL
AFFX-BioB-3_at 0.2173211
AFFX-BioB-5_at 0.2150827
AFFX-BioB-M_at 0.6611921
AFFX-BioC-3_at 0.7828146
AFFX-BioC-5_at 0.7416731
AFFX-BioDn-3_at 2.1812713
</code></pre>
<p>It has data! Unfortunately those <code>AFFX</code> genes are control probe sets.
Let&rsquo;s look at the other end of the data.</p>
<pre><code>&gt; tail(exprs(data))
HG-U133A-1-121502.CEL
ENSG00000280099_at 0.15484421
ENSG00000280109_at 0.16881395
ENSG00000280178_at -0.19621641
ENSG00000280316_at 0.08622216
ENSG00000280401_at 0.15966256
ENSG00000281205_at -0.02085352
</code></pre>
<p>Alright! Those are <a href="https://www.ebi.ac.uk/training/online/glossary/term/66">Ensembl genes</a>,
and we have normalized measurements for their intensity. <a href="https://www.jtolio.com/images/success.jpg">Success!</a></p>
<pre><code>&gt; hist(exprs(data))
</code></pre>
<p><div style="text-align: center;">
<figure >
<img src="https://www.jtolio.com/images/bio/scan-hist.png" />
</figure>
</div>
</p>
<h2 id="whew:e512c96984c68e012cdc5b3a25d9c0d9">Whew</h2>
<p>Alright, stay tuned for where this all gets us, and more importantly,
all the corrections I&rsquo;m going to have to post for this entry!</p>
<p><strong>Update:</strong> first blood! Mismatch probes are only on early microarrays.
Later Affymetrix microarrays don&rsquo;t have mismatch probes at all. My
descriptions have been updated to reflect this. I also want to sincerely
thank Dr. Piccolo both for his thorough help in helping me understand
this stuff, for his excellent SCAN.UPC library, and for proof-reading
this post!</p>
Equity vs Equality, Meritocracies, Social Justice, and Codes of Conducthttps://www.jtolio.com/2015/11/equity-vs-equality
Sat, 14 Nov 2015 14:31:00 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2015/11/equity-vs-equality
<p>Eric Raymond is at it again and so I guess I am too.</p>
<p>His <a href="http://esr.ibiblio.org/?p=6918">most recent post</a> delves into
why <em>Social Justice Warriors</em> (which I guess is meant to be a
disparaging term) should be expunged from hacker communities.</p>
<p>Arguments on the internet tend to <a href="http://www.smbc-comics.com/?id=2939">end up having some vocal
subgroups who espouse idiotic things</a>, but this one especially rubs
me the wrong way, so I think it&rsquo;s worth rebutting.</p>
<p>While it&rsquo;s certainly easy to denounce things ESR says about social
justice through tangents about his less savory beliefs (PUA, etc),
it might distract from the substance of this discussion, so instead
I&rsquo;m just going to mostly just talk about <em>meritocracy</em>, which has
become a hot-button issue even in its own right.</p>
<p>I do want to point out that every camp has bad apples, so please
refrain from defining your opinion of a group (social justice or
meritocrats or whatever else) by the worst ideas you&rsquo;ve heard from
them. This goes both ways. Everyone&rsquo;s just trying their best at
something.</p>
<h2 id="meritocracy-is-good-right:2dbc9e13ac8650efbc32bd07746b2397">Meritocracy is good, right?</h2>
<p>The tech industry has often applauded itself for its focus on
meritocracy. ESR says that he doesn&rsquo;t</p>
<blockquote>
<p>care whether my fellow contributors were white, black, male, female,
straight, gay, or from the planet Mars, only whether their code was
good.</p>
</blockquote>
<p>I believe him! To try and achieve excellence at anything, it&rsquo;s
important to evaluate ideas and solutions independently of anything
else. We should definitely not &ldquo;lower standards&rdquo; to try and be more
inclusive. This is a common argument from people who praise meritocratic
leanings.</p>
<h2 id="equality-vs-equity:2dbc9e13ac8650efbc32bd07746b2397">Equality vs Equity</h2>
<p>I&rsquo;m here to tell you things are never so simple.</p>
<p>If you&rsquo;re in a foot race, equality is when the rules are the same for
everyone. This is reasonable! Everyone wants to run the same race.</p>
<p>To bring this metaphor into a GitHub pull request is much trickier.
Evaluating if the finish line is the same is easy. Let&rsquo;s just look at the
code and ignore everything else.</p>
<p>But evaluating if the starting line is the same for everyone is
much, much harder. The only way a race to the finish line is fair is
if everyone gets to start at the same place, and that is simply not
true of the tech industry.</p>
<p><div style="text-align: center;">
<figure >
<img src="https://www.jtolio.com/images/equality-vs-equity.jpg" />
<figcaption>
<p>
<a href="http://offshegoes2013.blogspot.com/2014/06/equity-vs-equality.html">
Credit: Mary @ Off She Goes
</a>
</p>
</figcaption>
</figure>
</div>
</p>
<p>There are all kinds of structurally oppressive tendencies in our society
that, at scale, have strong filtering effects on who was able to participate
in the early tech industry and who is able to participate now.</p>
<p>Poverty may prevent you from getting access to a computer.</p>
<p>Stereotypes may prevent you from feeling allowed to contribute.</p>
<p>Feeling like you don&rsquo;t belong is a powerful effect, and while any individual might
overcome it, the stochastical effects of this at scale are significant.</p>
<h2 id="meritocrats-vs-social-justice:2dbc9e13ac8650efbc32bd07746b2397">Meritocrats vs Social Justice</h2>
<p>Social justice advocates aren&rsquo;t opposed to excellent output and excellent
work, nor are they for the long-term lowering of standards. In fact,
social justice advocates are attempting to raise standards and improve
output by increasing the diversity of thought, voice, and opinion! So
in this way, meritocrats and social justice supporters have a lot in
common.</p>
<p>Where I think the difference lies is that meritocrats typically aren&rsquo;t the
first to admit we currently have problems. The system is working for the
meritocrats, who don&rsquo;t feel shut out.</p>
<p>Social justice in my view is the attempt to not only recognize there is a
problem but find solutions to getting everyone to the same starting line.</p>
<h2 id="what-equity-looks-like:2dbc9e13ac8650efbc32bd07746b2397">What Equity looks like</h2>
<p>I&rsquo;ve written about <a href="https://www.jtolio.com/writing/2015/03/what-riding-a-unicycle-can-teach-us-about-microaggressions">microaggressions</a>
before, and the key thing to point out is that large amounts of small issues
seriously add up.</p>
<p>In the <a href="https://archive.is/dgilk">situation ESR describes</a>,
<em>djangoconcardiff</em> might be a troll I guess, but it doesn&rsquo;t matter. Also,
Roberto Rosario seems like a nice guy! And Roberto&rsquo;s point that a code of
conduct wouldn&rsquo;t really affect the outcome of things <em>from his end</em> is a
reasonable point. It doesn&rsquo;t make that much of a difference to Roberto, or
to the code, because Roberto will keep on worrying about keeping the
metaphorical finish line of code excellence the same.</p>
<p>However, the argument that adding a code of conduct shouldn&rsquo;t matter <em>cuts
both ways</em>. If it doesn&rsquo;t matter to add one, it doesn&rsquo;t matter to not add
one, either. And even though it might not affect the finish line, it could
very definitely affect the starting line.</p>
<p><a href="http://www.catehuston.com/blog/2015/09/02/code-of-conducts-and-worthless-manfeelings/">Good codes of conduct make people feel safe.</a>
If you don&rsquo;t feel safe and you don&rsquo;t feel welcome, why would you attempt
to contribute?</p>
<p>To achieve true equality, we must first achieve equity. And to achieve
equity, we <strong>must</strong> address the things that make minorities in our industry
feel unwelcome or unable to start.</p>
<p>Diversity of opinion leads to better analysis of problems and possible
solutions. Our industry can&rsquo;t afford to overlook such an obvious increase
to our combined merit.</p>
Research log: cell states and microarrayshttps://www.jtolio.com/2015/11/research-log-cell-states-and-microarrays
Thu, 12 Nov 2015 18:20:00 -0600hello@jtolio.com (JT Olio)https://www.jtolio.com/2015/11/research-log-cell-states-and-microarrays
<h2 id="context:e60b10ce491ce725910b27ca6630b6f9">Context</h2>
<p>I guess I should lay out some context since I never update this anymore.</p>
<p>The news about my life is that I&rsquo;m a graduate student at the University of Utah again,
working on a PhD, albeit part time. I&rsquo;ve been on leave from the grad program for so long
and my leave request justifications were getting so tortured that I decided I should
probably just go back before I get kicked out. We&rsquo;re out of our minds over at
<a href="https://www.spacemonkey.com/">Space Monkey</a> and I&rsquo;m staying on there part time as
Director of Engineering too!</p>
<p>Anyways, two days a week I am up in a cancer research lab. Surprise! Turns out when you
have <a href="http://matt.might.net/">cool advisors</a> who start crossing into new fields you
might get lucky and get to follow them around. My due progress forms now say that I will
attempt to find a novel and scalable way to discover targeted therapeutics for genetic
disorders instead of <a href="http://www.pants-lang.org/">whatever they used to say</a>. I&rsquo;m not
sure the forms actually say <em>attempt</em> but that specific word choice is key.</p>
<p>To say my statement of focus another way, using lots of data about cell behavior, I&rsquo;m
going to try and predict if the effects or side-effects of already FDA-approved drugs
can potentially help people with rare diseases, who in-and-of themselves don&rsquo;t
constitute a pharmaceutical market. Maybe we can help some people! I&rsquo;m in a cancer
research lab because cancer research has a lot of similar problems (and solutions!).</p>
<p>Instead of filling up @utah.edu inboxes of the current situation going on in my research
folder, I&rsquo;ve decided to write about it here instead. In addition, I&rsquo;m a firm believer
that explaining things helps you better understand them, and after noodling all day
Tuesday on a problem, I decided I really ought to
<a href="https://en.wikipedia.org/wiki/Rubber_duck_debugging">rubber duck</a> to you, dear reader.</p>
<p>So that&rsquo;s what&rsquo;s going on.</p>
<p>Just because I don&rsquo;t want to start all the way from the bottom, I&rsquo;m going to assume
you&rsquo;ve taken some Biology 101 class and understand what I think is hilariously still
referred to as <a href="https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology">the central dogma of molecular biology</a>. I&rsquo;ll try to briefly
recap the important points. (<em>Aside:</em> It&rsquo;s so encouraging when science icons do dumb
things like <a href="https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology#Use_of_the_term_.22dogma.22">assume the wrong definition of a word</a>
and then no one fixes it. <em>There&rsquo;s hope for me yet</em> is what I&rsquo;m saying.)</p>
<p>If you found this page by Googling or don&rsquo;t know me or something, this is a pretty
large shift in focus for me. I am a computer science/software development guy, and
have had exactly zero training in biology or any other like-science. Prior to this,
I read some books once and did pretty alright in high school.</p>
<p>Basically what I&rsquo;m saying is that I have no idea what I&rsquo;m doing, caveat emptor, etc.,
etc.</p>
<p>For this entry I&rsquo;m going to cover cell states and microarrays as background for what
I&rsquo;m actually doing, which will come in a subsequent post.</p>
<p>End context.</p>
<h2 id="cell-states:e60b10ce491ce725910b27ca6630b6f9">Cell states</h2>
<p>Cells, of course, have <em>DNA</em>, which is basically the cell construction blueprint. DNA
is made up of long strands of <em>base pairs</em> (A, T, G, or C). Every cell in an organism
has the same DNA and thus the same blueprints, but obviously not every cell behaves
the same, looks the same, does the same things, etc. Elbow cells are usually pretty
different from brain cells, though if they aren&rsquo;t you might want to get that checked
out.</p>
<p><em>Summary:</em> all cells have the same DNA but behave differently!</p>
<p>The thing that&rsquo;s different between your elbow cell and your brain cell is what <em>genes</em>
are being <em>expressed</em>. The word <em>gene</em> itself doesn&rsquo;t seem to actually have a
consistently used concrete definition (ask a biologist if a gene is defined pre- or
post- intron splicing!), but in general, a gene is roughly the atomic unit of a trait,
or, what causes your elbow cells to be different than your brain cells.
The part of a DNA strand that represents a specific gene, when activated, will be turned
into <em>RNA</em> which is then turned into proteins. So when I say <em>expressed</em> or
<em>gene expression</em>, I mean that a specific subsequence of DNA is actively getting turned
into RNA and then into proteins.</p>
<p><em>Summary:</em> cells behave differently because only some genes are being made into proteins
in each cell!</p>
<p>It turns out that because all of this evolved over long time periods from molecules just
bouncing around, the inner-workings of any cell are a haphazard nightmare of chaos. Long,
complicated Rube Goldberg-esque contraptions all operate in harmony to allow you to
even wonder if life really has any meaning. Often, all a gene will do is generate a
protein that will activate another section of DNA. As a result, cells have these
cascading feedback loops of proteins that activate genes that activate other proteins
that interact with a hormone that activates a protein that activate genes that activate
other proteins that then finally kick a domino over or make more
<a href="https://en.wikipedia.org/wiki/KRAS">KRAS</a> or something.</p>
<p><em>Summary:</em> genes get turned off and on via complex but haphazard processes!</p>
<p>These long chains of gene regulatory protein interactions are called <em>pathways</em>, and
as you might imagine, being able to know which gene expression pathways are active in
a cell is a very useful way to figure out what a cell is doing or what state that
cell is in. For example: is this cell a brain cell? If it has a lot of the same RNA
being expressed as an elbow cell and not a lot of the same RNA being expressed as a
brain cell then probably not!</p>
<p>For a concrete example of a pathway, here&rsquo;s a diagram illustrating the interaction
between the <em>Wnt</em> and <em>insulin</em> signaling pathways, courtesy of
<a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0008509">Yoon JC, Ng A, Kim BH, Bianco A, Xavier RJ, and Elledge SJ</a>,
<a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA 3.0</a>:</p>
<p><img src="https://www.jtolio.com/images/bio/wnt_pathway.png" alt="Wnt and insulin signaling pathways" />
</p>
<p><em>Summary:</em> figuring out what genes are turned on could help identify what a cell is
doing!</p>
<p>A brief comment about DNA sequencing, or the
<a href="https://en.wikipedia.org/wiki/Human_Genome_Project">Human Genome Project</a>, or similar
efforts: for all the popular interest and excitement DNA sequencing gets (or,
determining what the base pairs are that the DNA is made up of), we still don&rsquo;t
know enough to decipher what that DNA actually means. Imagine you and your friend Craig
Venter are racing to be the first to get the latest installment of
<em>A Song of Ice and Fire</em>, but the bearded author wrote it in Tagalog and you don&rsquo;t speak
Tagalog and you&rsquo;re not even sure if Austronesia is even a place. What good does this
book you just bought do you if you can&rsquo;t understand it? Sequencing a cell&rsquo;s DNA will
not tell you how to understand what that cell is currently doing, especially when the
DNA is the same for all of that organism&rsquo;s cells.</p>
<p>But, measuring the amount of different kinds of RNA (or proteins) will easily give
you clues as to what the cell is doing relative to other cells, even if you <a href="https://www.jtolio.com/images/i-dont-know-what-im-doing-dog.jpg">don&rsquo;t know
what the RNA means</a>! Measuring which
specific proteins exist is a hard problem, but luckily, measuring different amounts of
RNA is one of the things molecular biologist are best at! Yay!</p>
<p><em>Summary:</em> Measuring RNA amounts in a cell tells us what genes are active!</p>
<h2 id="microarrays:e60b10ce491ce725910b27ca6630b6f9">Microarrays</h2>
<p>Most of the data I&rsquo;ve been working with has been generated from <em>microarray</em> technology.</p>
<p>When RNA is transcribed from DNA, the RNA base pairs bond to the DNA strand and are then
essentially glued together, so that when the RNA is unzipped off the DNA, the RNA
is in the right complementary order. What this means is that any specific RNA strand
will bond with specific complementary strands of DNA. This is the key microarray
observation.</p>
<p>Without going into the gory details, it turns out that we can do two useful things.
First, we can actually make arbitrary synthetic DNA strands in the lab, and second,
we can exponentially increase the amount of DNA or RNA through <a href="https://en.wikipedia.org/wiki/Polymerase_chain_reaction">PCR</a>.</p>
<p>A microarray is simply a large number of wells where some small, synthetic strands
of DNA are chemically bonded to the bottom of the well. These synthetic strands are
called <em>probes</em>. At the bottom of each well is a very specific, chosen probe. If you
use PCR to proportionally increase the amount of RNA you have from a cell, you
can pour all that RNA into all the wells, let it sit for a bit, and then wash it all
out. What happens is the RNA that was most expressed is most likely to have bonded to
all the probes at the bottom of the wells and stayed through the wash cycle.</p>
<p>Great, now all we need is a way to measure how much RNA bonded to probes in each
well. It turns out, if you use a fluorescent dye or duplicate your RNA using
radioactive base pair isotopes, you can now measure just how much RNA bonded in each
well based on the light emitted from the well, and therefore, how much of a particular
RNA sequence was proportionally in a cell&rsquo;s expression!
<a href="https://www.youtube.com/watch?v=hJdF8DJ70Dc">Whapah!</a></p>
<p>As cool as this is, there&rsquo;s lots of room here for experimental error. This is no way
to collect exact data. The data you get back will be very messy, have lots of noise,
weird batch affects (like, the current temperature affects bonding!), etc. In fact,
since the C-G bond is stronger than the A-T/U bond (it has 3 chemical bonds
instead of 2), you&rsquo;ll see more of probes that have C/G base pairs in them.</p>
<p>Better technology exists. <a href="https://en.wikipedia.org/wiki/RNA-Seq">RNA-Seq</a> is almost
universally better in every way, except for perhaps cost to some degree. Nonetheless,
the older microarray technology is the currently dominant source of this kind of data.</p>
<p><em>Summary:</em> microarrays let us measure how much RNA there is in a cell!</p>
<h2 id="whew:e60b10ce491ce725910b27ca6630b6f9">Whew</h2>
<p>Okay, so next time we&rsquo;ll talk about where this all gets me in terms of finding
drugs for people with rare genetic diseases.</p>
<p>Please let me know what questions you have and I promise to give you a good show
while I stumble around trying to answer them!</p>
<p><em><strong>Update:</strong> a previous version of this entry claimed that &ldquo;epigenetics&rdquo;
was the name of the study of all forms of gene expression regulation,
whereas in actuality it is only the study of external or environmental
factors for gene expression regulation. I attempted to fix rather than
remove the comment about epigenetics, but it then seemed to distract
from the point I was trying to make, so for better or worse the tangent
will have to wait for perhaps a later entry. Sorry to have misled!</em></p>