Main menu

Post navigation

Math from scratch, part one

Computers are of course called “computers” because they’re good at math; there are lots of ways to do math in C#. We’ve got signed and unsigned integers and binary and decimal floating point types built into the language. And it is easy to use types in libraries such as BigInteger and BigRational[1. Which can be found in the Microsoft Solver Foundation library.] to represent arbitrary-sized integers and arbitrary-precision rationals.

From a practical perspective, for the vast majority of programmers who are not developing those libraries, this is a solved problem; you can use these tools without really caring very much how they work. Sure, there are lots of pitfalls with binary floats that we’ve talked about many times in this blog over the years, but these are pretty well understood.

So it’s a little bit quixotic of me to start this series, in which I’m going to set myself the challenge of implementing arbitrary-size integer[2. Maybe we’ll go further into rationals as well; we’ll see how long this takes!] arithmetic without using any built-in types other than object and bool.[3. As we’ll see, we’ll be forced to use int and string in one or two places, but they will all be away from the main line of the code. I will also be using the standard exception classes to indicate error conditions such as dividing by zero.]

My solution is not going to be pragmatic. It’s going to use immense amounts of memory, it is going to be orders of magnitude slower than BigInteger, and almost all my algorithms are going to be recursive. Since C# is not a tail-recursive language, this means that they’ll blow up with stack overflow exceptions when given numbers of only a few hundred bits. But hopefully we’ll learn something about both mathematics and recursive data structures along the way.

Before we get to the integers though we’re going to start with the naturals. The naturals are the numbers 0, 1, 2, 3, … and so on; they’re the “counting numbers”. We can loosely[1. A proper definition would also include restrictions like “zero is not the successor of any number” and “unequal numbers have unequal successors” and so on, but we don’t need to take a formal axiomatic approach for the purposes of this series.] define the natural numbers recursively:

0 is a natural number.

The successor of a natural number is a natural number.

There are lots of ways to represent natural numbers. In Gödel, Escher, Bach
I seem to recall that Hofstadter uses 0, S0, SS0, SSS0, and so on, to represent the naturals. A standard way to represent the naturals in set theory is to say that a natural number is represented by the set of all natural numbers smaller than it. So zero, having no natural numbers smaller than it, is the empty set, {}. One has only zero smaller than it, so it is {{}}. Two has zero and one smaller than it, so it is {{},{{}}}. Three is {{}, {{}}, {{},{{}}}}. And so on. The Church numerals use elements of the lambda calculus to represent natural numbers. We of course typically use a complicated decimal convention whereby a string of digits compactly represents a number as a series of multiplications and additions. And computers use a similar but simpler convention in which a string of bits represents a number.

We could choose any of these, or come up with yet another way to represent numbers. For the purposes of this series I’m going to stick with the final one: a natural number is represented by a series of bits.

Related

39 thoughts on “Math from scratch, part one”

I’m looking forward to this! I Just read “Good Math” by Mark C. Chu-Carroll in the Pragmatic Programmer’s series. It redefined “numbers” in almost every chapter. It provided a really great new way to think about the world of numbers. Even the simple ones are not so simple after all!

If you enjoy numbers and CS I can recommend “Purely Functional Data Structures” (ISBN-13: 978-0521663502) where the author observes that the operations on natural numbers and lists are similar. From that observation he constructs more and more advanced lists (to support immutability and fast cons, snoc and []). My favorite bit was the use of ambigious number systems to build more optimized lists.

For reference, Mark’s blog Good Math, Bad Math is quite an excellent resource for more posts like Eric’s. Mark often goes into much more (I would argue) arcane information, but it is one of my favorite blogs on the internet. He doesn’t post as consistently as Eric, and he doesn’t really do anything in Microsoft-related languages/technologies, but it’s mostly theoretical math stuffs. Check it out at: http://scientopia.org/blogs/goodmath/ if you’re interested.

My solution is not going to be pragmatic. It’s going to use immense amounts of memory, it is going to be orders of magnitude slower than BigInteger, and almost all my algorithms are going to be recursive.

Damn straight! This is how a maths library should be!

Practical considerations interfere with the purity of the design. Once you’ve proved that your algorithm is computable, your job is done, and it’s up to the electronics industry to provide you with adequate hardware.

I can live without arrays, and using arrays requires integers to index them with anyways. I can’t live without bool because I need to make “if” statements. Now, if I wanted to be really hard core I could avoid even using bools by defining my own Boolean type and doing some clever tricks, but the point of this exercise is to explore integer arithmetic, not the nature of Boolean computation.

Could one define an array-ish thing as a sort of list, with the idea that each element has a successor element, a[0] refers to a “first” element, and for any element a[i], a[Si] refers to its successor?

You are describing an immutable version of IEnumerator. It is vexing to me that IEnumerator encourages a mutable style of programming; I would prefer a system where MoveNext returned a new “cursor” referring to the next object, rather than doing so inside the enumerator.

Whether the calling code holds an immutable reference to a mutable enumerator, or a mutable reference to an immutable enumerator, *something* has to be mutable. In some cases it might be slightly better to have mutability confined to the consumer, there are many cases when that would be inefficient or even unworkable. How, for example, should an enumerator that reads from a file be implemented using only immutable objects? Should every operation open the file, seek to the proper spot, read some data, and close it? If not, when should the file be closed?

A bigger complaint I have with IEnumerable is that there’s no means by which an instance implementing it can indicate whether the sequence encapsulated thereby is guaranteed immutable, nor is there even any Framework type which encapsulates a guaranteed-immutable sequence (other than `String`, which encapsulates an immutable sequence of characters). To be sure, nothing would prevent a mutable class from implementing an `IsImmutable` property that falsely returned `true`, but outside of security-related code the danger of a class doing so is really no worse than the possibility of objects violating any other contract. A lot of code uses `ToArray` or `ToList` on an IEnumerable to ensure that it’s working with a definite snapshot–necessary if the IEnumerable is mutable, but wasteful if it isn’t.

I take your point but I think the argument about files is a bit of a red herring. The high cost of file operations is mitigated by caching the file; hard disks now have caches on them for better performance, and of course all the mechanisms of buffering in a file object are also there for performance. I don’t think the performance angle is actually relevant to my point. What I would like is for an enumerator to be more like a page number — a token that allows you to obtain the contents at a particular point — and less like a bookmark that you move through the book.

What form would an “immutable cursor” for an iterator take? Could iterators have `finally` blocks? If iterators didn’t need `finally` blocks, I suppose one could have a shallowly-immutable cursor hold a reference to both a mutable iterator and a linked-list node that it could use to retrieve anything that had been enumerated under the control of some other cursor which was enumerating the same data source. I’m still not sure what all that would accomplish, though.

Immutability is only meaningful when multiple references exist to an object. In most common situations involving iterators, that is never the case. Making iterators immutable would slow down what is by far the most common usage pattern [the `foreach` loop], while offering no advantage when that pattern is used.

Further, I think the biggest problems with mutability stem from the fact that .NET presently makes no distinction between a reference that exists for purpose of encapsulating an object’s state as part of its own, and one which exists for the purpose of encapsulating its identity. If you want to explore ways of avoiding mutability-related bugs, I would suggest allowing programmers to specify what storage locations encapsulate what combinations of mutable state that’s expected to change, “mutable” state that’s not supposed to change, immutable state, and identity. The C++ design was I think overcomplicated, and wouldn’t work well with the overall design of .NET, but I’d like to see exploration of storage location types that could encapsulate more than just a minimum base class constraint.

In my school time it never occurred to me that numbers are rigorously defined. When I found out about the axioms underlying the numbers that we intuitively use all the time I was fascinated. Their structure is very different from how we think about numbers. You can define addition as:

a+b=b+a
a+0=a
s(a)+s(s(b))=s(s(a))+s(b)

And that’s enough to derive everything about it. The last equation can be used to pull increments from b to a until we hit a+0=a and we are done. How come that humans find stuff like this beautiful?

It may be worth noting that the specified axioms establish s0 as the multiplicative identity without requiring any explicit definition thereof, since for any n, n*s0 = n*0+n = 0+n = n. It is your last statement about multiplication which establishes s0 as the multiplicative identity. Are you going to explore what happens if something other than s0 is the multiplicative identity? Defining something like ss0 as the multiplicative identity would seem to be an easy way to expand the system beyond integers.

Thanks! I took some time off from my casting project last year to deal with changing jobs and will likely pick it up again in the autumn. I jammed my finger in a latch mechanism while opening the gate to the driveway. Ironically I was hurrying on my way to buy a power tool. First time I’ve injured myself before buying a tool.

Sounds unpleasant! I’ll be keeping my eye out for it for new blog posts. I have an interest in metalworking myself, but it has been a tough sell for my wife. I am trying to convince her to let our 8 yr old design a board game and let me cast the pieces he designs in pewter. Figure that is a good, fairly safe way to get my casting foot in the door! We’ll see, though. She is resistant because I am rather accident-prone…

Pewter — which is mostly tin — and pot metal — which is mostly zinc — are good ways to get started casting, since you can melt them at around an easily obtainable temperature like 400 F or thereabouts. My aluminum foundry furnace runs at 1600 F, which will boil zinc, and that can be dangerous as zinc oxide gas is toxic. But at stovetop temperatures it is quite safe. Or, as safe as molten metal ever is.

The letter “s” means “the successor of [whatever follows]”. Thus, s0 is the number after (successor of) zero (i.e. one), ss0 is the number after the number after zero (i.e. the number after one, i.e. two).