Provide a program that will determine, given a set of 1s and 0s, if
there are more 0s or more 1s.

So the first question that comes to my mind is about the
representation of that set of 1s and 0s. Is it a linked list, a double
linked list, an array, a tree, a string, or something else?

If it's a list (single linked or double linked actually doesn't matter
much), it's going to be pretty straightforward: just read the list and
use a counter. Yes, a single counter is enough, you can add one to the
counter when you meet a 1, and subtract one when you meet a 0. At the
end, if your counter has its initial value, then you have the same
number of 1s and 0s. If it's lesser, then it has more 0s, else it has
more 1s.

letcountl=letrecloopcounter=function|[]->counter|1::tl->loop(counter+1)tl|0::tl->loop(counter-1)tl|_::tl->loopcountertl(* you might want to debate whether you should stop
when you get something else than a 0 or a 1. *)inletr=loop0linifr=0thenEqualelseifr>0thenMore_oneselseMore_zeros[try]

Well, what's the risk with this program? The risk is that the integer
counter overflows, of course! If you have a very long list of 1s (or
0s) only, you may get things just wrong. However, in OCaml, in
reality, there's not really a chance that the integer overflows
because of the length of a single linked list, especially if you're
using a 64-bit architecture based OCaml, on which the greatest integer
is 4_611_686_018_427_387_903 (about 4⨉1018). There's really
a long way to have such a long list because basically you would need
to allocate more than (about 32⨉106 terabytes) at once,
since basically a linked list of integers is made of blocks that have
2 cells each (one for the integer, one for the address of the next
cell), each cell taking 64 bits (or 8 bytes).

But then, what if you don't have linked lists but some stream that
gives you a very large number of 0s and/or 1s? Well, to begin with,
counting from 0 to 4⨉1018 takes a really long time. If your
machine can count from 0 to 109 in a single second (that
would mean your machine is very fast), it would still take
4⨉109 seconds, which is about 4000000000/(606024*365)
years. Oh, that means about 126 years! So let's just assume that a 63
bit signed integer is enough for us. And if you really can't assume
that for some reason, you can always implement 128 bit signed integers
quite easily, and if you don't know how to do that or if you're too
lazy to do it, use the Big_int module.

But let's go back the representation of those 0s and 1s. I'd like to
make the computation as fast as possible and I'll put those 0s and 1s
in a very compact representation. Each 0 and 1 will now only take one
bit in the memory (with a possible constant overhead for the whole).
For that, let's use OCaml's strings, which are basically arrays of
bytes. The longest string I can have on my machine will bear
1_152_921_504_606_846_904 bits (I know that because I multiplied
Sys.max_string_length by 8), and that's a lot (more than
108).

Now say we want to know whether there are more 0s or 1s as fast as
possible. How do we do that?

We don't want to count all 0s and 1s bit by bit, because that would
have quite a high cost! Indeed, we would have to get each byte, and
for each byte we would have to read each of its 8 bits (that can be
done) one by one. We don't want to do that.

Instead, since we have bytes, we can conveniently allocate an array of
size 256. Each cell of that array will contain the right number to
add to the counter. This way, we can read a byte, get its number of 0s
and 1s in O(1).

(* this table is computed only once *)lettable=letnumber_for_a_byteb=letr=ref0infori=0to7doif(blsri)land1=0thendecrrelseincrrdone;!rinleta=Array.make2560infori=0to255doa.(i)<-number_for_a_byteidone;a[try]

Then let's abstract from the means to read new 0s and 1s, by assuming
we'll be provided a function f that given () will return 8 bits in
a value of type char, and will raise the exception End_of_file
when it has no more bits to give.

Note that int_of_char has a zero-cost (i.e., it "disappears" at
compile-time because int and char sort of share the same memory
representation). If you want better performance, you should inline
f, provided that you know what it is (you may want to check if the
compiler does the inlining itself first, just in case).

Also, you may want to use a table with a size larger than 256 if you
have a lot of memory but I'm not so sure you'd gain performance unless
you use nasty tricks to read larger integers from a string. Then
again, you may not end up using a string, in which case you have to
think with the whole problem in mind.

started on 2013-09-25 16:19:07+00:00, (re)generated on 2014-01-15 15:14:11+00:00