The language of choice

Propositional logic was discovered by Stoics around 300 B.C., only to
be abandoned in later antiquity and rebuilt in the 19th century by
George Boole’s successors. One of them, Charles Peirce, saw its
significance for what we now call logic circuits, yet that discovery
too was forgotten until the 1930s. In the ’50s John McCarthy invented
conditional expressions, casting the logic into the form we’ll study
here; then in 1986 Randal Bryant repeated one of McCarthy’s
constructions with a crucial tweak that made his report “for many
years the most cited paper in all of computer science, because it
revolutionized the data structures used to represent Boolean
functions” (Knuth).1 Let’s explore and code up some of this heritage of
millennia, and bring it to bear on a suitable challenge: playing
tic-tac-toe.

Then we’ll tackle a task that’s a little more practical: verifying a
carry-lookahead adder circuit. Supposedly logic gets used all the
time for all kinds of serious work, but for such you’ll have to
consult the serious authors; what I can say myself, from working out the
code to follow, is that the subject offers a fun playground plus the
most primitive form of the pun between meaning and mechanism.

You’re encouraged to read with this article’s
code cloned and ready
to run, to try varying the examples, tracing their
execution, or messing with the code and seeing how it breaks. Literate code ought
to afford a more active style of reading: we’re not limited to the
affordances of paper.2

Decision trees and nets

We’re going to make a programming language– but a tiny one with
nothing but constants, variables, and if/else expressions. The
tininess will help us to analyze and transform its expressions more
aggressively than we’re used to doing for code in, for instance,
Python.

Code in such a language– let’s call it the choice language– might
look like

BifAelse0

but instead we’ll write this same meaning as A(0, B), for reasons:
it’s shorter, it generalizes to multiway decisions like A(X, Y, Z)
(selecting among three options for A’s value instead of two)3,
and it’ll spare us from writing a parser, because A(0, B) is a Python
expression that can be made to build an object representing the
choice-language expression.4

Informally, p(q, r) will mean “Look at p and use its
value to choose between q and r: for a value of 0, return
q’s value; but for 1, return r’s value instead.” Thus A(0,
B) will be 0 when A is 0, or else whatever B is, when A is 1.
(Note that Python’s B if A else 0 reverses the order of the arguments:
it has the ‘truthy’ case first.)

Building and evaluating expressions

Let’s code this. To be even lazier about not parsing any input, we’ll
make the user write const0 for 0:

(So the value of A(const0, B), in a context where A is 1, is B’s
value.)

classNode(object):"A decision-tree node."defevaluate(self,env):"My value when variables are set according to env, a dictionary."abstractdef__call__(self,if0,if1):"""Return an expression whose value is if0's or if1's,
according as self's is 0 or 1."""returnChoice(self,if0,if1)classConstantNode(Node):def__init__(self,value):self.value=valuedef__repr__(self):returnrepr(self.value)defevaluate(self,env):returnself.valueclassVariableNode(Node):def__init__(self,name):self.name=namedef__repr__(self):returnself.namedefevaluate(self,env):returnenv[self]classChoiceNode(Node):def__init__(self,index,if0,if1):self.index,self.if0,self.if1=index,if0,if1def__repr__(self):return'%r(%r, %r)'%(self.index,self.if0,self.if1)defevaluate(self,env):branch=(self.if0,self.if1)[self.index.evaluate(env)]returnbranch.evaluate(env)

and later redefine them to introduce optimizations. The first
optimization appears here already for Constant: memoizing it,
meaning that if you call Constant(0) twice you’ll get the same
object both times. This saves us from having to override equality and
hashing on ConstantNodes. The memoize function is overkill for
this purpose, but we’re going to use it for more later.5

defmemoize(f):"""Return a function like f but caching its results. Its arguments
must be hashable."""defmemoized(*args):try:returnmemoized._memos[args]exceptKeyError:result=memoized._memos[args]=f(*args)returnresultmemoized._memos={}returnmemoized

From table to tree

We can re-express a data table in our new language. For instance,

A B Answer
----+-------
0 0 | 0
0 1 | 0
1 0 | 0
1 1 | 1

can be encoded as

>>>Answer=A(const0,B)

Let’s automate the translation. We might express the table like “the
first row’s answerifthe variables match the first rowelseexpress the rest of the table”– in this case, 0 if A==0 and B==0
else (the remaining three lines). Build the equality test like so:

defmatch(variables,values):"""Return an expression that evaluates to 1 just when every variable
has the corresponding given value; 0 otherwise."""do_they_match=const1forvariable,valueinzip(variables,values):ifvalue==0:do_they_match=variable(do_they_match,const0)elifvalue==1:do_they_match=variable(const0,do_they_match)returndo_they_match

But we’d get a much larger expression than the one we wrote by
hand. Just to match the first row, it’s:

>>>match((A,B),(0,0))B(A(1,0),0)

(In words, “if B is 1, then 0; else if A is 1, then 0; else 1 for a match”.)

The whole table would need more than four times the size of this
expression. Adding more variables makes it exponentially worse–
here’s a three-variable table with eight entries:

One approach is called peephole optimization. Certain simple
expressions could be simpler, such as A(0, 1): it has the same value
as just A. Likewise, A(B, B) is equivalent to B. We could walk
through all the subexpressions of an expression, looking for these
patterns; it’s easier, though, to notice them at the time we build the
tree:

After this change, our table would translate into the same simple
expression we started with– but mostly by luck. majority still
becomes monstrous.

Splitting in two: the Boole expansion

In 1847 George Boole introduced divide-and-conquer: turn the table
into “iffirst variablethensubtable where that variable is 1elsesubtable where it’s 0”.67

defexpress(variables,table):"""Return an expression e such that, for each key in table,
e.evaluate(dict(variables, key)) == table[key]."""ifnottable:returnconst0# Or 1, doesn't matter.eliflen(table)==1:returnConstant(table.values()[0])else:first_var,rest_vars=variables[0],variables[1:]returnfirst_var(express(rest_vars,subst_for_first_var(table,0)),express(rest_vars,subst_for_first_var(table,1)))defsubst_for_first_var(table,first_var_value):"""Return a subtable of table, eliminating the first variable and
supposing it'll take the given value."""return{keys[1:]:outputforkeys,outputintable.items()ifkeys[0]==first_var_value}

Now, by aiming for a balanced tree and perhaps by giving the peephole optimization more to
work with, when we ask for majority we get back respectable code:

>>>C=Variable('C')>>>express((A,B,C),majority)A(B(0,C),B(C,1))

To be grand, you could say we’ve started to learn or synthesize
programs from sample inputs and outputs. In fact this method is
popular in machine
learning using
fancier alternatives for return const0 and for picking
first_var. But we’ll take a different tack to improvement:
eliminating common subexpressions.

Sharing structure: the BDD

The parity function– which is 1 just when the input has an odd
number of ones, so it really should’ve been named the oddity
function– cannot be expressed as a small tree. It’s easy to make a
big one:

Here D(1, 0) appears 4 times. Also appearing twice are C(D, D(1,
0)) and the same with reversed arguments. If we wrote down the parity
function for more variables, the table and the tree would both grow
exponentially, and so would the opportunities for sharing.

We already saw how to coalesce duplicates when we memoized
ConstantNode. Memoizing ChoiceNode turns our tree into a directed acyclic
graph (DAG).

The same expression now is built of only 6 unique Choice nodes. In
general, the parity function over n variables needs a number of
nodes linear in n, as a DAG, but exponential as a tree. (Claude
Shannon proved a generalization of this result in his master’s thesis,
planting seeds of the techniques we’ll look at later.)

Graphs like these are called binary decision diagrams (BDDs). The term
can mean any graph of choice nodes that branch only on variables, but
usually people restrict it to graphs that could have been built by our
express function: ones that only test variables in a consistent order
(such as A, B, C, then D); have no nodes like A(0, 1) or
A(B, B) as eliminated by Choice, above; and have no redundant
nodes as would be coalesced by memoize.

A tic-tac-toe machine

Daniel Hillis and Brian Silverman made a couple of different
tic-tac-toe players in Tinkertoys. Let’s try and design one too, down
to the level of choice gates, if not actual Tinkertoys. (When thinking
of the data structure reduced to hardware, we call a node a gate.)

tictactoe.py
is my player as a game-tree search function. We’ll apply it to
materialize the whole game tree– it’s pretty small!– into a table
mapping game boards to each one’s successor (meaning the board you get
after the chosen move); then we’ll crunch the table into a network of
choice gates.

We need to understand a few exports from tictactoe.py: a tic-tac-toe
board is called a grid, a game starts with empty_grid, successors(grid)
finds the possible next moves and performs them (generating the
resulting next grids), and max_play(grid) picks the ‘best’
successor for the player to move– so max_play is our game AI. A grid is
represented as a pair (p,q) of 9-bit integers, p for the player to
move and q for their opponent. Each bit tells whether a mark fills a
particular square on the board; bit 0 denotes a mark in the lower
right corner, bit 1 is for the square to the left of it, and so on:

(If you’re wondering about tuple(reversed(AI_to_move)), that serves as a
‘game over’ state that still shows the game board– it’d be impolite to
wipe the display before the human even sees their last move. The
reversal maintains the mapping from p and q to X and O.)

>>>len(all_responses)427

For a machine made out of binary gates, the most natural
representation is again 18 bits, 9 each for X’s and O’s. We can
represent the player’s move in the 9 X bits, and compute the next 9 O
bits from the current 18 X’s and O’s.

(for grid,(_,Os2) in all_responses.items() generates the grid
before each tabulated move, along with Os2: the configuration of O’s
after the O player moves in response. Then grid: (Os2>>square)&1
maps the input grid to the output 0 or 1 marking an O in a particular
tic-tac-toe square.)

These aren’t quite tables of the sort our express function eats,
yet: the keys are pairs of 9-bit ints, not the required tuples of
bits. We must convert them:

Remember that memoizing ChoiceNode, above, got us a BDD instead of a
tree. Assuming we cleared the memo table first (with
ChoiceNode._memos.clear()) we end up with 1236 choice gates (2985
without memoization):

>>>len(ChoiceNode._memos)1236

This can be improved by a heuristic: choices that ‘matter the most’
should get tested first. In tic-tac-toe that’s the center square most
of all, then the corners, then the edges. (This ordering of variables
in the BDD has nothing to do with the function computed by it: that
is, the moves the AI picks. An AI playing to lose would likely still
be more compact if encoded in a BDD that looked at the center square
first.) Interleave the X bits with the O bits to treat both center
bits first, then all corner bits, and so on:

squares=(4,0,2,6,8,7,1,3,5)defcombine(xs,os):returnsum(zip(xs,os),())

These tweaks bring us down to 956 gates. (The odd order 7,1,3,5 puts
the first edge next to the last corner. I can’t say why, so far as I’ve seen, other small
tweaks worsen this.)

Our express function works best on exhaustive tables, where every
possible key has a value. The O_tables are not exhaustive; tweaking
express to start to exploit don’t-cares gets us down to 556 gates:

defexpress(variables,table):ifnottable:returnconst0# (Can you be cleverer here?)eliflen(table)==1:returnConstant(table.values()[0])else:first_var,rest_vars=variables[0],variables[1:]# The new logic: if first_var takes on only one value# in the table...domain=set(row[0]forrowintable.keys())iflen(domain)==1:# ...then just assume it'll take that value in the input.value=next(iter(domain))returnexpress(rest_vars,subst_for_first_var(table,value))# Otherwise as before.returnfirst_var(express(rest_vars,subst_for_first_var(table,0)),express(rest_vars,subst_for_first_var(table,1)))

What if we could use three-way Choice gates? Encode a grid square as
0/1/2 for blank/X/O, and we’d need only half as many variables, and end up reducing the 556 gates to 379. Would 379 ternary gates cost less than 556 binary ones? It seems plausible, for instance as switches for falling-marble tracks. (What, you think a mechanical tic-tac-toe player implausible?)

In a 6th-grade science fair Steve Wozniak did much better: “about 100 transistors and about 100 diodes.” How? Beats me!

Tables, schmables

We’ve seen some ways for choice trees and choice graphs to compress
data tables. But when we start from tables, they can only grow so big
before exhausting our space and patience. What if we skip the table
building and go straight to the choice structures?

When the (non-)table in question is meant to express some logical claim,
like “if you’re outside and it’s dark, it’s night”, we can draw on
familiar language: NOT, AND, OR, 0/1 for false/true, and so on.

With these we can express functions in many variables, so many that a
tabulation would overflow the visible universe.

Exploring all possible worlds

We just evaluated a claim which turned out to be true in a particular world. We might ask: is it true no matter what? (That is, however the variables are set, provided they’re each 0 or 1.) Or likewise: find me a world where it’s true.

To find it given a table, we’d loop over table.items(). For a choice
tree claim, we might loop in the same way over all possible values
for its variables:

(This method is
called truth-table enumeration.) But we’re assuming now that this
virtual table would get impractically big; to do better we must
exploit the structure of the choice tree. Let’s try a goal-directed
walk, skipping those possibilities that we can rule out early:

defis_valid(claim):# A claim true in all worlds is false in none of them.returnsatisfy(claim,0)isNonedefsatisfy(tree,goal):"""Find a (partial) env that makes tree.evaluate(env) == goal,
else None."""returnnext(tree.find(goal,{}),None)classNode(object):# ...deffind(self,goal,env):"""Generate (partial) environments, extending `env`, in which
I evaluate to `goal`."""abstractclassConstantNode(Node):# ...deffind(self,goal,env):ifself.value==goal:yieldenvclassVariableNode(Node):# ...deffind(self,goal,env):ifselfnotinenv:yieldextend(env,self,goal)elifenv[self]==goal:yieldenvclassChoiceNode(Node):# ...deffind(self,goal,env):# Succeed where the index evaluates to 0, and if0 to the goal...forenv1inself.index.find(0,env):forenv2inself.if0.find(goal,env1):yieldenv2# ... or likewise for index 1 and if1.forenv1inself.index.find(1,env):forenv2inself.if1.find(goal,env1):yieldenv2defextend(env,var,value):"A copy of env but mapping var to value."result=dict(env)result[var]=valuereturnresult

This search may take time exponential in the depth of the choice
expression– which can be worse than enumerating the truth table
(exponential in the number of variables), but it can also be much better:
the expression may be shallower, or branches of the search may be cut off
early.

A pen-and-paper version of this algorithm, with special cases for AND,
OR, and the rest, is taught as the method of analytic
tableaux.
Since that form of the method leaves you a choice of which
subexpression to target next, a smart choice might streamline the
search. (For instance, a & b can be treated like either a(0, b)
or b(0, a); if a looks ‘less branchy’ than b–
it’s a smaller expression, say, or uses variables we’ve already set–
then the search is likely to go faster with a first.)8

An algebra of choice

We’ve gained a new way to check whether two claims are always equal:
is_valid(Equiv(claim1, claim2)).
Another way uses algebra, following John McCarthy’s rules:

Selection:

const0(a, b) == a
const1(a, b) == b

Case analysis and irrelevance:

p(const0, const1) == p
p(a, a) == a

The same p twice collapses into one (and it’s associative):

p(a, p(b, c)) == p(a, c)
p(p(a, b), c) == p(a, c)

Nesting:

q(p, r)(a, b) == q(p(a, b), r(a, b))

Swapping:

q(p(a, b), p(c, d)) == p(q(a, c), q(b, d))

McCarthy’s notation had (p -> r; q) for our p(q, r). An infix
version like Haskell’s, q `p` r, is rather pretty, more
like familiar algebra: so the associative law becomes

a `p` (b `p` c) = a `p` c = (a `p` b) `p` c

These laws together let you standardize choice trees: take any two
trees, rewrite each by the method below, and then the resulting trees
will be identical to each other just when the original trees are equal
in meaning.

How? Massage the tree into a mirror of the Boole expansion of the
corresponding table, like this: First eliminate any nested tests–
subexpressions like q(p, r) in q(p, r)(a, b)– using the nesting
rule. Then enforce a standard order for the variables, like the order
of variables passed to express() in examples above. Aim to move the
first-ordered variable to the root of the tree (lifting it using the
swapping law). Repeat for each subtree, using the other laws to
eliminate repeated variables, and also any constants from the test
position of any subtrees.

For example, suppose we’ve decided on alphabetical order for the
variables. b(a, c) is partly in standard form, having no nested
tests, but the variables are out of order. We can fix that:

b(a,c)=b(a(0,1),a(c,c))# By case analysis and irrelevance=a(b(0,c),b(1,c))# By swapping

This produces the same expression as if we’d run express((a,b,c), table)
on a tabulation of b(a, c).

For more practice in the algebra, try proving the distributive law:

a `p` (b `q` c) = (a `p` b) `q` (a `p` c)

Systematic succinctness

A BDD (the usual, restricted kind) is McCarthy’s standardized
tree, made into a DAG by memoizing. We’ve built some already from
tables, but we can do more: given expressions in this form, we can
compose them into more complex expressions in the same form,
efficiently, by applying McCarthy’s rewrite rules only so far as
they’re needed.

Let’s start our code over from scratch. The changes:

Since a choice node will now always be indexed by a variable
(since we eliminate nested tests), make ChoiceNode hold the
variable itself. We can drop VariableNode from the node types,
since one of them can always be represented by one of the new choice
nodes.

We’ll need an ordering of the variables, under the user’s control,
saying which variables go higher in the DAG than others. The
simplest way is to represent variables by numbers: smaller numbers
go higher in the DAG. Call this number the ‘rank’ of the variable,
and of the ChoiceNode. (If a user wants to name their variables,
they could always make up an array of names indexed by rank.)

The changes to the classes reflect the structural changes planned
above:

classConstantNode(Node):rank=float('Inf')# (Greater than any variable's rank.)def__init__(self,value):self.value=valuedefevaluate(self,env):returnself.valuedef__call__(self,*branches):returnbranches[self.value]Constant=memoize(ConstantNode)const0,const1=Constant(0),Constant(1)defVariable(rank):returnbuild_node(rank,const0,const1)classChoiceNode(Node):value=None# (Explained below.)def__init__(self,rank,if0,if1):assertrank<if0.rankandrank<if1.rankself.rank=rankself.if0=if0self.if1=if1defevaluate(self,env):branch=(self.if0,self.if1)[env[self.rank]]returnbranch.evaluate(env)def__call__(self,if0,if1):ifif0isif1:returnif0if(if0,if1)==(const0,const1):returnself# The above cases usually save work, but aren't needed for correctness.returnbuild_choice(self,if0,if1)build_node=memoize(ChoiceNode)

And there’s wholly new logic to keep the variables ordered, lower-rank first:

@memoizedefbuild_choice(node,if0,if1):"""Like Choice(node, if0, if1) in logic.py, but McCarthy-standardized,
presupposing the arguments are all McCarthy-standardized."""top=min(node.rank,if0.rank,if1.rank)cases=[subst(node,top,value)(subst(if0,top,value),subst(if1,top,value))forvaluein(0,1)]returnmake_node(top,*cases)defmake_node(rank,if0,if1):ifif0isif1:returnif0returnbuild_node(rank,if0,if1)defsubst(node,rank,value):"""Specialize node to the case where variable #rank takes the given value.
Again, node must be standardized."""ifrank<node.rank:returnnodeelifrank==node.rank:return(node.if0,node.if1)[value]else:returnmake_node(node.rank,subst(node.if0,rank,value),subst(node.if1,rank,value))

In our earlier code build_choice(node, if0, if1) would be just
Choice(node, if0, if1), but now we must make sure to branch first
on the lowest-ranked variable. (Imagine it as the lightest, floating to
the top of the downward-growing DAG.) We find that variable, call it
top, and branch on it with make_node(top.rank, *cases). The
cases are equivalent to Choice(node, if0, if1) with the top
variable set to each of its possible values.

Every ConstantNode has a dummy rank to keep special cases out of
this logic. (Similarly each ChoiceNode has a dummy value.)

As before, we memoize the node constructors (ConstantNode and
ChoiceNode) to make DAGs instead of trees. If those were all we
memoized, the code would be logically correct, but algorithmically
awful: it would run as slowly as McCarthy’s tree standardizer, with
its exponentially bigger trees. The memoize on build_choice cuts
off the repeated work before it starts.

Moving on to queries on these structures: satisfy and is_valid
could be adapted to DAGs with little change, but to reap the
efficiency gains we’d have to memoize satisfy too (and the
generators it returns). We’ll go for efficiency more directly by
making this version of satisfy produce just one result.

defis_valid(claim):returnsatisfy(claim,0)isNonedefsatisfy(node,goal):"""Return the lexicographically first env such that
node.evaluate(env) == goal, if there's any; else None."""env={}whileisinstance(node,ChoiceNode):ifnode.if0.valuein(None,goal):node,env[node.rank]=node.if0,0elifnode.if1.valuein(None,goal):node,env[node.rank]=node.if1,1else:returnNonereturnenvifnode.value==goalelseNone

As another fun problem, see if you can efficiently count how many
ways a claim can be satisfied; for a corollary, generate solutions
uniformly at random.

A more ambitious example

So let’s try our new tool on a problem too big for a table: adding
large numbers in binary.

defripple_carry_add(A,B,carry):"The simplest adder in logic gates."assertlen(A)==len(B)S=[]fora,binzip(A,B):sum,carry=add3(a,b,carry)S.append(sum)returntuple(S),carrydefadd3(a,b,c):"Compute the least- and most-significant bits of a+b+c."s1,c1=add2(a,b)s2,c2=add2(s1,c)returns2,c1|c2defadd2(a,b):"Compute the least- and most-significant bits of a+b."returna^b,a&b

A and B should be two lists of logic expressions (and carry, a
single expression) with the least significant bits
first. ripple_carry_add then returns logic expressions for the sum
bits and the carry out. They’re built following the most
straightforward algorithm for addition, the same one we’re taught in
school, but for binary numbers. A circuit following this algorithm may
take time linear in the length of the inputs: adding 111…1 to
111…1 makes the carries ripple all the way from the right end to the
left.

You can add faster in the worst case by speculating: compute the sum
for the left half of the inputs under two assumptions in parallel:
assuming the carry out from the right half will be either 0 or 1. By
the time that carry is available, so will be the speculative sums, and
we can select one or the other immediately. Repeating the speculation
recursively results in a bigger but shallower circuit.

defcarry_lookahead_add(A,B,carry):assertlen(A)==len(B)@memoize# (to keep the circuit from blowing up in size)deflookahead(lo,nbits,assume_cin):ifnbits==0:returnConstant(assume_cin)elifnbits==1:s,c=add2_constant(A[lo],B[lo],assume_cin)returncelse:m=preceding_power_of_2(nbits)returnlookahead(lo,m,assume_cin)(lookahead(lo+m,nbits-m,0),lookahead(lo+m,nbits-m,1))# The carry at each place.C=[carry(lookahead(0,hi,0),lookahead(0,hi,1))forhiinrange(len(A)+1)]returntuple(a^b^cfora,b,cinzip(A,B,C)),C[-1]defpreceding_power_of_2(n):importmathreturn2**int(math.log(n-1,2))defadd2_constant(a,b,bit):ifbit==0:returnadd2(a,b)else:returnEquiv(a,b),a|b

Do these different adders compute the same function? Let’s see if they do.

We just checked whether two different 64-bit adder circuits produce
the same outputs for all 2129 of the possible values of A, B,
and carry. It took only a moment in interpreted Python. Magic!

The magic has limits: a run with interleaved=False would fail on any
computer now existing. The BDDs get too big.

Not the end

We saw Bryant’s revolutionary algorithm coded as a pair of memoize
annotations on (a natural Python rendition of) a method published a
quarter century earlier by another Turing Award winner,
McCarthy. Memoization dates to the 1960’s as well. Stories like this
one, and the long gap between Peirce and Shannon in linking logic with
digital circuits, I take as encouragement to not only study the
textbooks, but to try to work out the basics of a subject for
yourself: simple ideas are still out there undiscovered or forgotten,
waiting to be put together.

Donald Knuth (2011), The Art of Computer Programming, Volume 4A
(Combinatorial Algorithms, Part 1). If you’d like to learn more about
the topics in this article, this seems the deepest and most fun source
to read next. I’ve only dipped into it so far. (He also surveys how
BDDs actually were developed, as opposed to my reimagining of how they
might have been.) His
website (at this
writing) offers a fascicle on SAT solving as well.

Benson Mates (1972), Elementary Logic. The last chapter summarizes
the history I condensed into a sentence. Mates’s 1953 Stoic Logic
covers and defends it.

Thanks

Thank you to: Kragen Sitaker for comments, encouragement, and
diagram-generating code; Brandon Moore for discussions about
tic-tac-toe circuit generation; the Recurse Center for distractions
from writing; and the editors for saintly forbearance.

See the References section above for this and other sources. I’ve chosen not to clutter the main text with these links. ↩

Only by, as it turned out, my failing to finish the intended JavaScript toys in time. ↩

The code repo includes this generalization; it’s left out of the article to keep the code more concrete. ↩

B if A else 0 would be a Python expression as well, but Python can only evaluate it to A or 0; we can’t override it to mean building a tree representing the choice-language expression we want. ↩

His bigger idea was to write if x then y else z as x*y + (1-x)*z, solve the resulting equations by ordinary algebra, and interpret the solutions as inferences. His system was not yet the one we call Boolean algebra today. ↩

The case if not table: may seem unnecessary, except we were never promised an exhaustive table. The tic-tac-toe tables we’ll see later will only include boards that can come up in a game, omitting the rows for some other combinations of variables. ↩