I wasn't sure how many New Year entries I've done, and I was rather
surprised to find that I've managed to post onthefirstofJanuaryeveryyearexceptfor 2001 and 2002. And far be it from me to stop that
tradition this year.

Update on Tuesday, January 3rd, 2012

Via Michael
Duff are a
few home videos with commentary. It's quite funny if you watch DVD commentaries on a regular basis, as
the two brothers in question critique the home video as a “film” and not
as some boring home videos.

It looks like today is “Attack Day.” I run a program to show the
output from syslog in real time (it's part of my syslogintr
project) and (like right as I type this) I'm seeing a slew of bogus DNS queries:

And not just from that IP
address either—so far 87 different IPs have been sending bogus requests to my DNS server. I would also like to find
the program that does this, as every single request has come from
the same port. Different IP address, sure, but the source port is always the
same.

I'm also seriously tempted to write a program to send back a nice, custom
response to these, in the hopes that the program actually cares about the
response. The obvious thing to do is send back a response that contains an
infinitely long domain name—it's not hard to do, just the right two bytes
in the right location and you have an infinitely long name to parse (this is
exploiting the DNS message
compression scheme—spcdns has code to
protect against this, by the way). Or maybe not an infinitely long domain
name, but an insanely long one (again, easy to do by exploiting the message
compression scheme, and again, spcdns has protection against
this attack as well).

Perhaps better would be to return an answer to a question that was never
asked to begin with. “Oh, you want any record for isc.org?
Here, have the LOC record for
nsa.gov. Have a nice day.” Or perhaps just echo back the
original packet and really confuse the sending program.

But in doing some searching, this appears to be an old denial of service
attack against Internet
Systems Consortium (the makers of bind, quite possibly the
most widely used DNS server)
and as such, any bogus reponses would probably not do anything to the
attacking software, which probably ignores any replies anyway.

I've done a bit more research and apparently my server is part of a DNS amplification attack, where
some machine (or machines) somewhere on the Inernet is sending my server
(along with possibly other DNS servers) a forged DNS request, in the hopes that my DNS server will do the requested DNS lookup and return the result (in
this case, any DNS record for
isc.org, which is known for returning rather large DNS resonses) in the hopes of denying
service to the forged IP
address.

And even though my server won't do the actual DNS request, it still returns a packet saying as
much, so even though my server is not sending a large packet, it is
returning a packet, and thus participating the the DDoS attack, however
little.

So even if I did send back a bogus response, it wouldn't be directed at
the guilty party.

Sigh.

So I guess the thing to do is just filter those requests at the
firewall.

“Don't Tell My Mother I'm
in Iran” is an interesting look into Iran
and shows me (in my opinion) that it's not necessarily Iran that's
bad, but the Iranian government (I suppose one could say the same
of us—we're not bad, but our government is certainly questionable).

It's also hard to fathom there being twenty-five (25!) synagogues in Tehran. Who'da
thunk?

I'm going through the backlog of links I wanted to talk about when I come
across this lovely PHPism:

elseif, as its name suggests, is a combination of
if and else. Like else, it extends an if
statement to execute a different statement in case the original
if expression evaluates to FALSE. However,
unlike else, it will execute that alternative expression only
if the elseif conditional expression evaluates to
TRUE. …

There may be several elseifs within the same if
statement. The first elseif expression (if any) that
evaluates to TRUE would be executed. In PHP, you can
also write 'else if' (in two words) and the behavior would be
identical to the one of 'elseif' (in a single word). The syntactic
meaning is slightly different (if you're familiar with C, this is
the same behavior) but the bottom line is that both would result in
exactly the same behavior.

The elseif statement is only executed if the preceding
if expression and any preceding elseif expressions
evaluated to FALSE, and the current elseif
expression evaluated to TRUE.

Note: Note that elseif and else if will only
be considered exactly the same when using curly brackets as in the
above example. When using a colon to define your if/elseif
conditions, you must not separate else if into two words, or
PHP will fail with a parse error.

So what this insane bit of verbiage is saying, is that “elseif” and
“else if” are the same, except when they're not, which has to do with
using either braces to separate code, or colons (which I'm not familiar with
syntax wise in PHP). In effect, PHP supports both “elseif” and
“else if” but with slightly different subtle semantics
that could trip you up if you aren't careful.

The great horizontal killer applications are actually just fancy
data structures.

Spreadsheets are not just tools for doing
“what-if” analysis. They provide a specific data structure: a
table. Most Excel users never enter a formula. They use Excel when
they need a table. The gridlines are the most important feature of
Excel, not recalc.

Word processors are not just tools for writing books, reports,
and letters. They provide a specific data structure: lines of text
which automatically wrap and split into pages.

PowerPoint is not just a tool for making boring meetings. It
provides a specific data structure: an array of full-screen
images.

In the past, I've given Smirk grief over his use of Excel to make what I
called “glorified text files,” but I see he's not alone in using Excel for
tracking lists. In fact, I suspect that if the entire calculating engine of
Excel were excised, not many people outside the financial realm would even
notice (and the financial system would probably be better off too).

So it looks like Joel has a point—spread sheets provide
a type of data structure, and people use it as such. Looks like I'll have
to cut Smirk some slack now. Sigh.

I normally don't upgrade software unless there's a compelling reason for
me to do so, and there are a few compelling reasons for me to upgrade the Linux kernel. It's not
features that I can't live without (for I'm doing so right now) but there
are some features, like signal
and timer delivery via file descriptors, that have intrigued me enough
to contemplate it.

Okay, in the late 90s I used to fairly regularly build custom Linux
kernels for my various computers. But that was in the
2.0–2.1 days, when 2.0.x was
the “stable” version, and 2.1.x was the “development”
version. These days, it's all development versions with the random
version, like 2.6.9 or 2.6.20, given the moniker
of “stable,” just because.

But really, how hard could it be?

Okay, I downloaded 3.1.8 (3.1? Already? I
thought 3.0 was just
released!), but it requires a later version of GCC than I
have. Okay, so I need a new version of GCC. Which probably requires the
latest binutils.
And because of new system calls since Linux 2.6.9 (which I'm
running), I need to upgrade glibc, and while I'm at it, a
few utilties like ps and lsof and …

Really? Is it this complicated? [Sean
goes off, reads the Linux From Scratch
Book and runs away screaming. Yup, it's that complicated.
—Editor].

For Christmas, Hoade gave me 99
Ways To Tell A Story: Excercises in Style, an interesting book whereby
the same story (an eight panel cartoon about a guy walking to the
refrigerator and forgetting what he was going to look for)
ninety-nine different ways; a different style, a different genre,
different number of panels, whatever. Ninety-nine different ways.

It got me to thinking. While the book was about different ways to
present a story, what about programming? Okay, other than sounding
completely insane, could a program be written ninety-nine different
ways?

An easy way is a different computer language for each version. Sure,
there's CPL, BCPL, B, C (four variations there—K&R, C89, C99, C11),
C++ (C++, C++9x, C++2x), Objective-C, D, Fortran (many versions over the
years), BASIC (just about every computer made between 1975 and 1985 came
with its own dialect of BASIC, along with the original Dartmouth version),
Algol (Agol 60, Algol 68), Pascal (Pascal, Turbo Pascal, Delphi), Assembly
(basically each CPU architecture has its own form, for instance the 6502,
6800, 6809, 68000 (which has variant), 8080, Z80, 8086 (all the way up to
the latest Pentium 4), MIPS (which has variants), SPARC (and variants), ARM
(and variants), PDP-1, PHP-7, PHP-8, PHP-10, PHP-11, VAX) Forth (just as
many dialects as BASIC), Modula (Modula and Modula-II), SNOBOL, ICON, Hope,
bash, sh, csh, ksh, VIth, Alice, Pilot, COBOL, Intercal, Perl (several major
variations), Piet, Python (Python 1, Python 2, Python 3), PHP (practically
every version ever released), awk, Ruby (nearly every version ever
released), Lua (several versions), Malbolge, Java (several major revisions),
Lisp (Lisp, Lisp 1.5, MACLISP, Common Lisp, Scheme (I know! I know! It's
not Lisp, even though it has the same syntax and pretty much the same
command set, it's a LISP1 and Common Lisp is a LISP2 (and if you have to
ask, you'll have to take a few graduate programming courses to
understand))), Erlang, Prolog, Haskel, ML, Oberon, LOLCODE, Befunge, Chef,
BrainXXXX and that alone will probably get us to 99
versions right there.

But I don't have access to a lot of these languages. Heck, most of them
are dead, obscure or esoteric and trying to even find examples would be
difficult. Especially since what I want to do is more than just a simple “Hello World”
program. I want to write a program that is actually useful, but not so long
as to make this insane project … um … insaner.

So I'm going to try just a few languages (which still leaves me with
plenty to choose from; my home system alone comes with C, Ruby 1.8, Perl
5.8, Python 2.3, Python 2.6, PHP 5.1, Lua 5.1, C++, sh, bash, awk, 68000
assembler, x86 assembler and probably a few I'm forgetting about. I might
not hit all of these, or maybe I will. We'll see.

And the program I selected for this insanity silly treatment
is a small utility I wrote back in the early 90s when I first learned
C—it's a program that dumps data in hexadecimal:

One rule I've set for myself: the output of each program shall be the
same (if at all possible). And the baseline for the output is yesterday's version. It's also a
useful test—if the output doesn't match, there's a bug somewhere. Other
than that, anything goes.

The term “K&R” is still used to refer to a particular style of
writing C code (which I personally can't stand, but that's me)—the
placement of opening braces, the severe indentation and often times a vowel
impairment in names (which I didn't go for here).

But the term can also refer to code written before C was first
standardized in 1989 (that is known as “ANSI C” or “C89”). While
you had to always declare all your variables, function parameters, on the
other hand, only had to be mentioned and unless otherwise noted, were
assumed to be of type int. The same goes for the function
return value—unless otherwise noted, all functions return a type of
int.

We have function prototypes, and more appropriate typedefs for some of
the variables, but in the K&R style (ick). Lots of software is still
written using this style, like Linux, on the
grounds that if it was Good Enough™ for Kernighan and Ritchie, then it's Good
Enough™ for the rest of us, never mind that Kernighan and Ritchie wrote their
software on teletypes, which is near
enough to a manual typewriter hooked up to a computer that if I used one, I
would try to type as little as possible myself. But personally, I don't use
a teletype; I use a real keyboard and a
huge monitor with a small font, so I find little use for the K&R
style.

My first exposure to an IDE was in the mid-80s with Turbo Pascal 3, and I
hated it. Not the language per se
but the editor. By then, I was used to IBM's
PE (version 1.0—never found a bug but there were a few limitations,
mostly due to it being able to run under MS-DOS 1.0) with
it's true block copy, the ability to move anywhere on the screen and type
(and have it insert spaces, if required) and fairly pneumonic keybindings,
so I had some issues with how Borlandthought an
editor should work.

I found it a nightmare.

And then when Turbo Pascal 4 came out, with an entirely new interface where
they tried (and in my opinion, failed) to do “windows” in a text mode and
well … it took a bit over a decade for me look at another IDE.

By now it's the late 90s, and I'm working on Brainstorm.
One of the first Java IDEs came out (and I have no idea what program it was
or even what became of it). I thought I'd give it a try as I was curious if
it would handle an existing project.

It didn't.

My code killed it. I suspect the programmers of that IDE never thought that
anyone would bother with writing their own layout manager, and I recall the
dialog went something like:

IDE

What … is your language?

Sean

Java.

IDE

What … is your quest?

Sean

To compile this Java code I wrote.

IDE

What layout manager are you using?

Sean

Really? I wrote my own.

IDE

Huh? I don't know that [falls over the Bridge of Death into the Gorge
of Eternal Peril] Auuuuuuuuuuuuuuuuuuuuuuuugh!

Scratch another IDE off my list. And a bit over a decade passes.

We're doing a lot of Java
programming at The Ft. Lauderdale Office of The Corporation and most of the
programmers are using this IDE called Eclipse (we doing both backend stuff and
Android development).
I've heard of it. Nearly all Java programmers swear by it. I figure I'd
give it a go, if only as an source code/object viewer. I suck down the 300+
megabyte package that Ubuntu offers overnight and give it a
go.

And … yeah. I have no idea what I'm doing. Why does it want a
“workspace?” How do I load an existing project into the darned thing? Why
is the Android Eclipse
extension failing? Oh, the “stable” version that Ubuntu coughed up is
more than twenty minutes old, and therefore, an ancient and decrepid piece
of XXXX. I should know better by now.

So off I go to the Eclipse site, and I'm faced with a dozen different options for
Eclipse. Wait? There are three different versions for Java, one for
C/C++? One for Javascript? Wait? I thought Eclipse could work with a
bunch of different languages. Shouldn't these all be modules or extentions
or something? You mean I have to download a separate version for each
language I want? And what's with the three Java versions?

Auuuuuuuuuuuuuuugh!

And off I go the Bridge of Death into the Gorge of Eternal Peril.

Okay, so I pick one, download it, figure out I can just run the darned
thing and don't have to install it. Okay, the Android extension (another
umptillion bytes) installs fine, and I figure out that I can use the
existing project, but only if I build it from the command line first (um …
isn't that kind of defeating the purpose of an IDE?) and neither I nor J (office mate)
can figure out why I'm getting these two errors about overriding an
interface (which is the point of an interface—you override it). If I do
the so called “quick fix” that Eclipse suggests, it fails on the same line
with a different error.

Sigh.

The Android Emulator runs the code just fine … I guess … since I'm
supposed to test this code. But i can launch the compiled application
(compiled via command line) on the emulator, so the code works (and no
errors from the command line compiler there). It's just that Eclipse
doesn't like the code.

Par for course. Of course.

I can still use it to browse the code, and follow the relationships of
all the objects. And indeed, one of the warnings that Eclipsed barfed up
did indeed turn out to be a real bug (an unused variable that turned out
should have been used). So that's good. But all the other warnings are
bogus, as “fixing” them causes other errors. So I have to pretty much
ignore all that, and just use Eclipse as a glorified version of
more, only one that automagically cross references
everything.

Oh, and it gets hopelessly confused when I checkout new versions from the
source repository and have to manually tell Eclipse to reload the changed
files, instead of having it just figure it out on its own.

It's comical, I tell you.

If that wasn't fun enough, I figured I try out the “C/C++ version” of
Eclipse, if only as a code browser (since we do have some C++ code, and the
call depth does make it rather difficult to follow using a more traditional,
but less flaky, text editor). So I download that version. I'm still not
quite sure what the “workspace” is, since when I point the “workspace”
to the top level directory of our existing C/C++ codebase, it does nothing.
No, I have to select a “new project” which is an “existing project,”
none of which exactly matches what we have, but I select the one that most
closely, but not exactly, matches what we have only to have Eclipse
immediately wet its pants and dump core, all over the place.

Now, I thought Eclipse was written in Java, a managed
language that produces not real machine code, but virtual code that
is then emulated by a runtime engine—the whole “write once,
debug run everywhere” schtick. How does that dump
core? What's wrong Eclipse? You can't deal with 2,100 source code
files?

Oh, I see you're still horribly confused from the previous 2,100 file
codebase. Okay, I delete everything you touched, re-extract from the
downloaded tarball and try again. Feel better? Should I lay out some
newspaper in case you barf again? No? Okay.

Hmm. I still don't fully understand this business with “workspaces”
but whatever. Here's the top level directory for SPCDNS. Oh, you can't
find anything. Start over. Here's the source directory for SPCDNS. Ah,
you like that. But you can't build, because the Makefile is
missing.

Seriously. Eclipse. You can't deal with a Makefile one
level up? Oh for crying out loud …

Start over. New project. Entirely new project. Oh look, one of the
options is for autoconf. I've
never bothered with that, but maybe Eclipse can show me a thing or two about
… oh never mind, that's right. My Ubuntu install is now fourty minutes
old and the installed autoconf might as well be in Sumarian for
all you care, Eclipse.

Start over. New project. Makefile. GCC. New file. dns.h.
Load it up in another text editor, select all, copy. Paste into Eclipse.
Seriously, Eclipse? 600 errors? It's a XXXXXXX header file! You don't have to compile that!
Okay, let me continue with the C code. Load codec.c into a
text editor. Select all, copy, paste into a new file in Eclipse. Oh, now
it's 1,234 errors? Oh, you don't like the restrict keyword …
what? You don't understand C99? Don't worry, Mark doesn't care for C99
either, so you're in good company there, but … really?

Start over. New project. Pure C. Makefile. GCC. Check the options,
ah, find where I can specify C99 on the command line. Select, copy, paste
dns.h into Eclipse. 600 errors. Okay, okay, I'll include the
XXXXXXX headers you want. Happy? Okay, on to
codec.c. Two warnings this time, about two unused
functions.

Really? Those are unused? Okay, I'll remove one of then, and the
prototype and—

It's not much different than the C89version. The main difference is the ability to
declare variables when needed instead of the beginning of a block of code.
I don't particularly care for that feature, but I do like the
ability to declare variables inside the for() statement, like
I've done here.

To the untrained eye, it probably looks like every other version I've
presented here, yet there is is a difference, subtle as it may be.
But even in the
book that inspired this series there were plenty of examples that
weren't all that much different.

Back in the K&R days, C code tended
to play rather loose with the rules. As a result, some pretty subtle bugs
would go undetected, such as passing the wrong number of parameters to a
function, the wrong type of parameters to a function, and ignoring the
results of a function. Because of these types of errors, a program called
lint was
developed that could detect them, as well as other commonly made mistakes.
In fact, lint was very fussy about the code it was given.

But it was a popular tool (I remember the ads for PC Lint that would show a
snippit of C code that had a subtle bug that PC Lint could detect. I got
good enough to spot the errors shown in the ads) and one could always tell
code that's been through lint because of code like:

(void)printf("hello world\n");

The standard these days seems to be a program called splint and man, is
it picky; just getting code to pass through splint is hard
enough, but then there's the -strict option:

-strict

Absurdly strict checking. All checking done by checks, plus
modifications and global variables used in unspecified functions,
strict standard library, and strict typing of C operators. A
special reward will be presented to the first person to produce a
real program that produces no errors with strict checking.

I'm actually surprised at just how few splint directives I
needed (they're those funny looking comments like
/*@-frobnitz@*/) to get this code through splint
-strict. The only hard part was the function prototype—it didn't
matter if I included the parameter names:

Splint 3.1.2 --- 07 Dec 2009
06.c:34:27: Declaration parameter has name: fpin
A parameter in a function prototype has a name. This is dangerous, since a
macro definition could be visible here. (Use either -protoparamname or
-namechecks to inhibit warning)
06.c:34:38: Declaration parameter has name: fpout
A parameter in a function prototype has a name. This is dangerous, since a
macro definition could be visible here. (Use either -protoparamname or
-namechecks to inhibit warning)
Finished checking --- 2 code warnings

splint bitched about the prototype. I could have rearranged
the code so the prototype was unnecessary, but I decided to shut that
particular error up with the /*@-protoparamname@*/ ...
/*@+protoparamname@*/ directives. But really, other than that and
one other minor bitch, the code passed splint -strict rather
easily.

Standardization to C brought with it a way to annotate variables other
than its type: how it is to be accessed. volatile
informs the compiler that the value cannot be cached and must
always be read from when referenced, because some outside agent
(hardware, another process or thread) could have changed the contents since
the last read, and const,
which marks a variable as “read-only,” which means the value can be
heavily cached as it won't change what-so-ever.

So today's code is the base
version (which is C89), but with “const
correctness.”

C99 adds
restrict to the ways one can modify the access to a variable.
The rational behind this is a bit esoteric—it tells the compiler that a
pointer is the only pointer to a block of memory.

Yes, it does seem odd to have to add a keyword for that, but it does help
with code optimization. For instance, the following (silly) function:

int foo(int *p1,int *p2)
{
*p2 = *p1 * 17;
return *p1 * 17;
}

The problem here is that the compiler has to do the
multiplication twice, as p2 could be pointing to the same location as p1,
and thus, the contents pointed to by p1 could be modified. So the compiler
is forced to write machine code like:

Writing Solid Code is one of
only two programming books that really change how I write code (the other
being Thinking Forth but that's
for another episode post), begining with the liberal use of
assert() to, well, not validate input parameters, but
to enforce that they're valid.

Prior to this book, I wrote defensive code, so prior to reading the book,
I would have coded do_dump() as:

Not very much code (and in this code, useless as well), but in a larger
codebase, it does add up. And it hides problems with the code. The first
project I liberally used assert() I really went crazy with it.
The codebase implemented “window regions” on a text screen, and every
routine used assert() to not only check that I didn't slip in a
NULL pointer, but that every field of all the structures I
defined had reasonable values.

And doing so saved me a lot of debugging time in the corner cases, like,
what exactly does it mean to have a “window” that's only one character
wide? Or even a window that's one character wide by one line high? The
assert()s would trip up on all sorts of corner cases like this,
and given that I was programming the code under MS-DOS, an errant pointer
could not only crash the program, but the entire machine (at best—at
worst, it could corrupt memory that wouldn't be detected until some other
program ran).

“In a system of a million parts, if each part malfunctions only
one time out of a million, a breakdown is certain.”

—Stanislaw Lem

So it's Regression Test Time™ again (for “Project: Wolowizard”) at The Ft. Lauderdale Office
of the Corporation, only this time, with new, addtional regression
tests!

Joy.

Okay, it's not too bad. It's a rather simple matter to add the cases to
a master list of test cases and expand the program that uses this list to
generate the data used for the regression test. That was probably about an
hour or so of work. Then a minor change to the actual test program to make
sure it fires off the messages under the right conditions (two different
messages, ten cases, a 100×100 matrix, but easy enough to code).

Then, generate all the data, copy it all out to the four servers required
to run the test, get the latest build of all the programs, move
them out to the test servers, make sure the configuration files are
up to date on all the servers, make sure The Protocol Stack From Hell™
won't puke, and fire up the regression test.

Only to have one component fail each test because it can't communicate
with another component.

Yesterday's problem? It turned
out to be a misconfiguration. Or rather, the configuration file format
changed enough to break the configuration files checked in for regression
testing.

Sometime since the last regression test, parameters that deal with time
can now take a suffix to denote the time unit being used (for example,
“9s” for 9 seconds, or “3d” for 3 days) and the base unit for
non-suffixed values changed (from “seconds” to “milliseconds” I'm
guessing) so what was once configured to time out in 15 seconds would now
timeout in 15 millisecconds, and thus, the one component would
think the other side timed out.

I saw the initial changes, but I neglected to update a few key parameters
properly. It's an easy thing to miss (as it took me two tries to change all
the affected parameters).

Sigh.

But that aside, the regression test finally ran (well, it's still
running—it takes hours for the thing to run).

I haven't talked much about SOPA
and PIPA but I've been aware of them for some time. In fact, on the
sites I normally travel, it was hard not to come across it. And
then today … everybody by now has heard of SOPA.

And yes, it's bad. But why? It could be that there are a lot of people
with money that want bits to have color, when bits have no
color:

Bits do not naturally have Colour. Colour, in this sense, is not
part of the natural universe. Most importantly, you cannot look at
bits and observe what Colour they are. I encountered an amusing
example of bit Colour recently: one of my friends was talking about
how he'd performed John Cage's famous silent musical composition
4′33″ for MP3. Okay, we said, (paraphrasing the
conversation here) so you took an appropriate-sized file of zeroes
out of /dev/zero and compressed that with an MP3 compressor? No,
no, he said. If I did that, it wouldn't really be
4′33″ because to perform the composition, you
have to make the silence in a certain way, according to the rules
laid down by the composer. It's not just four minutes and
thirty-three seconds of any old silence.

My friend had gone through an elaborate process that basically
amounted to performing some other piece of music four minutes and
thirty-three seconds long, with a software synthesizer and the
volume set to zero. The result was an appropriate-sized file of
zeroes—which he compressed with an MP3 compressor. The MP3 file
was bit-for-bit identical to one that would have been produced by
compressing /dev/zero … but this file was (he claimed)
legitimately a recording of 4′33″ and the other
one wouldn't have been. The difference was the Colour of the bits.
He was asserting that the bits in his copy of 433.mp3 had a
different Colour from those in a copy of 433.mp3 I might make by
means of the /dev/zero procedure, even though the two files would
contain exactly the same bits.

Now, the preceding paragraph is basically nonsense to computer
scientists or anyone with a mathematical background. (My friend is
one; he'd done this as a sort of elaborate joke.) Numbers are
numbers, right? If I add 39 plus 3 and get 42, and you do the same
thing, there is no way that “my” 42 can be said to be different
from “your” 42. Given two bit-for-bit identical MP3 files, there
is no meaningful (to a computer scientist) way to say that one is a
recording of the Cage composition and the other one isn't. There
would be no way to test one of the files and see which one it was,
because they are actually the same file. Having identical bits
means by definition that there can be no difference. Bits don't
have Colour; computer scientists, like computers, are Colour-blind.
That is not a mistake or deficiency on our part: rather, we have
worked hard to become so. Colour-blindness on the part of computer
scientists helps us understand the fact that computers are also
Colour-blind, and we need to be intimately familiar with that fact
in order to do our jobs.

The trouble is, human beings are not in general Colour-blind. The
law is not Colour-blind. It makes a difference not only what bits
you have, but where they came from. There's a very interesting Web
page illustrating the Coloured nature of bits in law on the US Naval
Observatory Web site. They provide information on that site
about when the Sun rises and sets and so on … but they also
provide it under a disclaimer saying that this information is not
suitable for use in court. If you need to know when the Sun rose or
set for use in a court case, then you need an expert
witness—because you don't actually just need the bits that say
when the Sun rose. You need those bits to be Coloured with the
Colour that allows them to be admissible in court, and the USNO
doesn't provide that. It's not just a question of accuracy - we all
know perfectly well that the USNO's numbers are good. It's a
question of where the numbers came from. It makes perfect sense to
a lawyer that where the information came from is important, in fact
maybe more important than the information itself. The law sees
Colour.

Or maybe it goes deeper than that—that the Inernet is such a disruptive
technology that it threatens all sorts of industries, not only because bits
have no color, but that it democratizes the means of global mass production,
and that may scare some people more than the colorless bits:

What's different now is that distribution costs have disappeared.
Suddenly, hobbyists have the same reach as businesses and are seen
as real competition. Unfortunately, hobbyists don't distribute for
the same reasons and don't play by the same rules. That's a
fundamental problem.

A business is run for money, even if it does creative things. It
has expenses and investments. It has a physical location and
distribution channels. A business has to play by the rules in order
to keep earning money, and because they are vulnerable—to
lawsuits, regulations, taxes and police.

A hobbyist is doing it for love, not money. He has almost no
expenses—just put your music up on YouTube and promote it online,
all for free. Since there is no monetary investment, no payroll, no
building, no sales channel, the hobbyist does not have a lot to
lose.

If a business breaks the law, it can be sued or a government can
close it down. There aren't that many businesses in a given field,
so it's relatively easy to police them. There are millions of
hobbyists and they require no money to do their thing. Even if you
sue them, you can't recover your costs because they have no money.
And there are too many to shut them down individually.

On top of that, the internet is global, so many of the people a
business wants to sue or arrest aren't even within its jurisdiction.
The internet didn't just drop distribution costs, it made it
possible to evade restrictive laws passed to protect publishers.

Viewing this as hobbyists vs. businesses makes a difference. The
current story from publishers is that everything was fine until the
internet came along and pirates started to steal all their products.
The reality is that it's not just about piracy.

Hobbyists have always been there, creating art, music, books,
comics, open source software, etc. The internet has just forced
these two worlds into collision. Even if all the piracy
disappeared, publishers would still be in trouble.

Whatever the case, SOPA/PIPA is bad, and should be rejected by the United
States govevernment. It's bad enough that I have to keep buying all these
damned buggy whips when it's clear that the future is going to be these
horseless carriages I keep hearing about.

I'm running the regression tests for “Project: Wolowizard” and about half way through the
tests (around the two hour mark or so) start failing. Sometimes expected
results just aren't showing up. I'm freaking about a bit because of all the
issues we've had in running these tests, only for it to start failing in yet
a different way.

Now, a bit about how this all works—there are four computers involved;
one runs the tests, injecting messages towards a mini-cluster of two
machines, either of which (depending on which one gets the message) sends a
message to the fourth machine, which does a bunch of processing (which may
involve interaction with a simulated cell phone on the testing machine),
then responds back to the mini-cluster, which then responds back to the
testing machine.

Now, I can check the immedate results from the mini-cluster, but the
actual data I'm interested in is logged via syslog, so I have
that data forwarded to the testing machine and my code grovels through a log
file for the actual data I want. And it's that data (or part
thereof) that apparently isn't being logged, and thus, the tests are
failing.

Now, it just so happens that the part of the test that's failing is the
part dealing with the mini-cluster, and it looks like about half the tests
are failing (hmm …. ).

I log into each of the two computers comprising the mini-cluster, and
check /etc/syslog.conf, in the off chance that changed. Nope.
I then explain the problem to Bunny, standing (or rather, sitting) in as my
cardboard programmer
when it hits me—I should check to see if the program is running.

Okay, just because syslogd is running doesn't necessarily
mean it's running correctly. So I run logger -p local1.info
FOO on each machine and yes, one of the machines is failing to foward
the logs to the testing machine.

Ahah!

I restart syslogd on that system, and lo! The log entries
are getting through now.

You know, I expect there to be issues with the stuff I'm testing; what I
don't expect is the stuff that we didn't write is having issues (the
Protocol Stack From Hell™ notwithstanding).

Okay, reset everything and start the regression test over again …

Update in the wee-hours of the morning, Friday, January 20th,
2012

A bit over half-way through the regression tests, and the log files
rotate. Aaaaaaaaaah! Okay, reset all the data, and start from the last
failed test. That's easy, since I can specify which cases to run. That's
hard, because I have to specify nearly a 100 cases. That's easy, since I
can use the Unix command seq to list them. That's hard,
because the test cases aren't just numbers, but things like
“1.b.77” and “1.c.18”, and while the shell supports
command line expantion from a running program via the backtick (ala
for i in `seq 34 77`; do echo 1.b.$i; done) I need to nest two
such operations (echo `for i in `seq 34 77`;do echo 1.b.$i;
done`) to specify the test cases from the command line, and the
command line doesn't support that. Okay, I can create a temporary
file that lists the test cases …

I decided to take a break with the C versions for a few days, given that
a) I'm hopelessly behind on posting, and b) the next few versions are
interesting and I want to make sure I have plenty of time for the write-ups.
So I went back to my programming roots, to the first computer I ever owned,
a Tandy Color
Computer 2, and figured, why not do a hex dump program for it? In its
version of BASIC.

One of the rules I set for
myself is that the output of each version should match if at all
possible, and I'm afraid that this version (and the next one) fall
under those weasle words—the output won't be exactly the same. It's not
hard to understand why though, when you realize that the text screen of the
Color Computer is 32×16. Yes, it's only sixteen lines of 32
characters each. Yeah, this:

Now, I don't know if the code actually works (unlike the
previous twelve versions). I still have my first computer, but I haven't
turned it on in years, and in fact, I would have to dig it out of storage,
find a TV that worked to hook it up to (for a video display), then type in
the program, debug it, then … um … type it in again, since I don't have
any easy way of transferring data off the Color Computer (that
would require more digging to piles of cables to find the right set of
cables and adaptors, going from a 4-pin DIN to USB
with some form of null-modem cable thrown
in). So in theory the code works, but in practice …

The main problem is that the DOS BASIC commands are geared towards record
based files (both sequential and random access) and not the more modern
“stream of bytes” paradigm in use today. I'm reading what I hope are 256
byte binary records. That's where my biggest concern lies really.

Now, I could read in binary data directly; there is a command to do that,
but it reads in a raw sector from the disk; I would have to write code to
decode the actual file structure. It's not a complicated structure, but
it's a bit more effort than I want do go into right now, and would distract
from a “simple” program.

Yesterday's code was labeled
“COLOR COMPUTER BASIC, EASY” not because it was easy to write (it was
somewhat easy—seeing how I didn't have to run it, and going off referenece
material for a language I haven't used in over twenty years) but because it
was relatively “easy” to read.

I'm being serious.

I never saw any published BASIC
code look that nice. No, it would usually be presented as this (but
probably without the blank lines):

Line 20 here covers lines 20 through 103 of yesterday's code, and I only
broke it there because of the IF statement, which ends at the end of a
number statement. Otherwise, it could
have been longer, up to 255 characters in length. All due to memory
constraints—4,096 bytes, 16,384 bytes or 32,768 bytes of RAM to fit both the program and data
(and if you want high resolution graphics, you give up 6,144 bytes; 12,288
bytes if you want double-buffered high resolution graphics—and by “high
resolution graphics” I mean 256×192 pixels, two colors).

R stops by my desk and drops off a new toy unit to play
with test. It's a network device you can plug a POTS line into and make calls over the
Internet. I guess we're testing the toy to play with unit to see
if our phone network features work with it.

It's a nice looking device and as R hands it to me, I see that's it still
on (yes, it comes with both an internal battery and a wall-wart). The tests
aren't complicated, but I do need to read the manual to figure out how to
run a few of them (involving conference calling, and forwarding phone
calls elsewhere). R also hands me the box the toy to play with
unit came in.

As I searched through the box for the manual, I come across a USB cable, still in it plastic wrap.
Okay, the unit comes with a USB port; what doesn't these days? I then find the manual
and start flipping through it. There's the diagram of the toy to play
with unit with a description of each port and button on it. I notice
the USB port has a note:

NOTE: Never place a USB-based device into the USB port of the XXXX XXXXX
XXXXXXX under any circumstances. Doing so may damage the
device and negate its warranty. The port was designed for
diagnostic purposes only; it is not intended for customer use.

So, not only is there no sticker over the USB port saying “removal of this sticket voids
warranty” but they give you a USB cable not to plug into it!

Methinks there is a disconnect between manufactoring and packaging at the
factory that makes these toys to play with units.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2012 by Sean Conner. All Rights Reserved.
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program. If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************
-- Style: Lua 5.1
function do_dump(fpin,fpout)
local offset = 0
while true do
local line = fpin:read(16)
if line == nil then return end
fpout:write(
string.format("%08X: ",offset),
line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
string.rep(" ",3 * (16 - line:len())),
line:gsub("%c","."),
"\n"
)
offset = offset + 16
end
end
-- **********************************************************************
if #arg == 0 then
print("-----stdin-----")
do_dump(io.stdin,io.stdout)
else
for i = 1 , #arg do
local f = io.open(arg[1],"r")
io.stdout:write("-----",arg[1],"-----","\n")
do_dump(f,io.stdout)
f:close()
end
end
os.exit(0)

What I'm noticing (besides my text editor's horrible attempts at syntax
highlighting in this entry) is that the non-C versions are quite a bit
shorter than the C versions. I'm sure part of that reason is the high level
of abstraction obtained by not using C. For instance, in this version, the
code to dump the data is easily half the length of the shortest C version, thanks to the clever string.gsub()
routine in Lua.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2012 by Sean Conner. All Rights Reserved.
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program. If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************
-- Style: Lua 5.1, recursion
function do_dump(fpin,fpout,offset)
local line = fpin:read(16)
if line == nil then return end
fpout:write(
string.format("%08X: ",offset),
line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
string.rep(" ",3 * (16 - line:len())),
line:gsub("%c","."),
"\n"
)
return do_dump(fpin,fpout,offset + 16)
end
-- **********************************************************************
if #arg == 0 then
print("-----stdin-----")
do_dump(io.stdin,io.stdout,0)
else
for i = 1 , #arg do
local f = io.open(arg[1],"r")
io.stdout:write("-----",arg[1],"-----","\n")
do_dump(f,io.stdout,0)
f:close()
end
end
os.exit(0)

Here, we have the do_dump() function calling itself for each
lines worth of data. If you don't have experience with recursion, this is a
common technique of solving certain programming problems by having a
function call itself with either a simpler case to solve, or, like in this
example, by calling itself with more data. And it just works.

If you are familiar with recursion, you might be horrified at
such a solution, since a very large file might cause the program to crash
since with recursion, the program (behind the scenes) keeps track of
everything it's already done and thus, could run out of memory.

But in this case, we don't have to worry. Lua takes advantage of what's
called “tail call optmization.”
In this case, you can think of the tail call as a form of goto,
but this type of goto can also goto other
functions, which is useful in implementing state machines. For example, a
pseudocode version of the TFTP protocol, in Lua:

I earlier said I wasn't a fan of
tail call optimization but then, I didn't see a use for it. Here I do,
at least for state machines. But for a hex dump program, not so much, but
it doesn't hurt all that much either—it's still a loop in this case.

Since Lua is a
dynamically typed language (“values have types, not variables”) we can
check the type of a variable at runtime and behave accordingly. Before, we
were restricted with just dumping a file, but we could also dump strings
(which in Lua can be pure binary data). So today's version checks what type
the input is; if it's a file, we read data from there, otherwise if the
input is a string, we pull the next blob of data out of it.

Granted, we don't actually use that feature here, but we can
more easily reuse do_dump() elsewhere.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2010 by Sean Conner. All Rights Reserved.
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program. If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************
-- Style: Lua 5.1, recursion, runtime type checking
function do_dump(fpin,fpout,offset)
local line
if type(fpin) == 'string' then
if offset > string.len(fpin) then return end
line = fpin:sub(offset + 1,offset + 16)
else
line = fpin:read(16)
if line == nil then return end
end
fpout:write(
string.format("%08X: ",offset),
line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
string.rep(" ",3 * (16 - line:len())),
line:gsub("%c","."),
"\n"
)
return do_dump(fpin,fpout,offset + 16)
end
-- **********************************************************************
if #arg == 0 then
print("-----stdin-----")
do_dump(io.stdin,io.stdout,0)
else
for i = 1 , #arg do
local f = io.open(arg[1],"r")
io.stdout:write("-----",arg[1],"-----","\n")
do_dump(f,io.stdout,0)
f:close()
end
end
os.exit(0)

Once more into the breach, but I
remembered last time this happened,
and acted accordingly. But I needn't worry—while we were swarmed with men
armed with huge chunks of roast critter, this time, the restaurant was way
more crowded and thus, we weren't swarmed quite as heavily.

Also, amusingly, on the far wall from where I was sitting was a large
wide screen television showing closeups of grass, of all things.
And it wasn't made up like a window either—the grasses would change every
so often. And yes, it was video of grass, not static images of
grass.

Yesterday's version checked the
input to see if it was a file or a string and acted accordingly. That's
fine, but perhaps a better way is to include a callback function and some
opaque piece of datum for that callback to work on. That way, we can
operate on more than just strings or files. It's open ended on what we can
support.

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2010 by Sean Conner. All Rights Reserved.
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program. If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************
-- Style: Lua 5.1, recursion, callback
function do_dump(fpout,offset,callback,data)
local line = callback(data,offset)
if line == nil then return end
fpout:write(
string.format("%08X: ",offset),
line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
string.rep(" ",3 * (16 - line:len())),
line:gsub("%c","."),
"\n"
)
return do_dump(fpout,offset + 16,callback,data)
end
-- **********************************************************************
local function cb(data,offset)
return data:read(16)
end
if #arg == 0 then
print("-----stdin-----")
do_dump(io.stdout,0,cb,io.stdin)
else
for i = 1 , #arg do
local f = io.open(arg[1],"r")
io.stdout:write("-----",arg[1],"-----","\n")
do_dump(io.stdout,0,cb,f)
f:close()
end
end
os.exit(0)

#!/usr/bin/env lua
-- ***************************************************************
--
-- Copyright 2010 by Sean Conner. All Rights Reserved.
--
-- This program is free software: you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation, either version 3 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program. If not, see <http://www.gnu.org/licenses/>.
--
-- Comments, questions and criticisms can be sent to: sean@conman.org
--
-- ********************************************************************
-- Style: Lua 5.1, recursion, closure as callback
function do_dump(fpout,offset,callback)
local line = callback(offset)
if line == nil then return end
fpout:write(
string.format("%08X: ",offset),
line:gsub(".",function(c) return string.format("%02X ",c:byte()) end),
string.rep(" ",3 * (16 - line:len())),
line:gsub("%c","."),
"\n"
)
return do_dump(fpout,offset + 16,callback)
end
-- **********************************************************************
if #arg == 0 then
print("-----stdin-----")
do_dump(io.stdout,0,cb,io.stdin)
else
for i = 1 , #arg do
local f = io.open(arg[1],"r")
io.stdout:write("-----",arg[1],"-----","\n")
do_dump(io.stdout,0,function(offset) return f:read(16) end)
f:close()
end
end
os.exit(0)

Here, our function (which is not named, as you don't really need to name
functions in Lua)
references our open file f, but in order to do so, Lua needs to
include a reference to f to the function when said function is
passed to do_dump(). It does so by creating what's called a
“closure”—think of a closure as both a pointer (or reference) to a
function, plus a pointer (or reference) to data that is outside the normal
lexical
scope of the function.

And why do I pass in the offset when my unnamed (“anonymous”) function
doesn't use it? Because it might be useful in some contexts to know where
to pull the data (say from a block of memory).

When last we left the C
versions, we pretty much hit the limit of what we could do using the
standard C library to remain portable (well, we did use a GCC
extenstion). Not much else we can do, unless we want to leave the Land
of Portability™ and start hitting some system specific calls.

The major trick here is that I generate the output for each line
backwards! I do that because it's easier to generate the
hexidecimal output that way. Generating the hexidecimal output “forwards”
would mean I need to rotate the first four bits down into position (so with
a 32-bit value, I would need to shift the bits down 28 positions), then
generate the hex digit, then rotate the next four bits down 24 positions,
but by then, I'm doing repeated rotates and discarding all the work I did
previously for each digit. And if I only want to work with 8 bits, I have to
have another special function do handle that, or complicate one function to
handle multiple number of bits.

But by going backwards, I start with the last four bits, which are
already in the “proper position” to generate a digit, then shift everthing
down four bits, and keep repeating this until the specified number of
hexidecimal digits are produced.

So, while the amount of code goes up, it is faster than the more
portable version:

It's almost twice as fast, yet it spends a disturbingly large amount of
time (compared to the portable version) in the kernel. It's because of all
the calls to write() I do. That's a problem I'll attack in the
next version.

Yesterday's version was faster
than the portable version, but
spent nearly 100 times longer in the kernel than the portable version, and
that's because the portable version, using the standard C library,
buffers the output way more than my non-portable system
calling version did. Making a subroutine call into the kernel (a “system
call”) takes way more time than just calling a regular subroutine
call.

So, we need to avoid making a ton of system calls, and to do that, we
need to buffer the output a bit more. This version, we buffer an entire
line's worth of data before writing it out.

Not bad—one third the time overall, and one fifth the amount of time
spent in the kernel. And compared to the portable version, this only takes
one fifth the total time, although it's still spending over twenty
times as long in kernel space.

So yesterday I presented a
non-portable version that was quite a bit faster than the portable version,
but I'm not quite done yet. That version just buffered a line at a
time—today's version buffers nearly 8k
worth of data (it's not exact, but it's close enough) between calls to
write().

And yes, that's the real output—1/10 the time of the portable version
with a similar amount of time in the kernel.

Frankly, I was a bit surprised at these results—not that the
non-portable version was faster (that's almost a given) but the magnitude of
the results. I didn't think the standard C library had that much overhead.
I was expecting easily a percentage increase in speed, but even
twice would have been unexpected, but ten times
faster?

Wow.

Increasing the size of the buffer past what I have probably won't help
all that much, and in fact, when I doubled the buffer size:

I checked the code GCC produced (all code was compiled
with -O3, a very high level of optimization) and well, I'm not
sure I could have done much better, and probably would have done worse—GCC
inlined everything into do_dump() (with the exception
of main() and mywrite()), something I would
not have done in assembly (and have any hope of code reuse for
another project). So I think we're done with making this code fast.

That's not to say I won't do an assembly version of this program, but it
probably won't be for the x86 line.

foo_t *p = NULL;
/* lots of code not touching p at all */
if (p) {
/* lots of code that will never be executed because */
/* p is always, *always* NULL at this point */
} else {
/* this code will always *always* be executed */
/* p is never touched otherwise */
}
/* p is still never used */

Yes. At one point p was probaby used, then a code change sometime during
the Clinton
Administration (late first term most likely) removed the need for
p but later code still checked it, so in order to keep the code
from crashing (during the last year of the Clinton Administration, most
likely) the “easiest fix that would work with minimal code changes because
we want to avoid a five day regression test” is to just NULL
out the variable where declared and call it a day.

Odder yet is the code that generates a string, checks to see if the
generated string ends with two newline characters and then adds one or two
newline characters if required (and yes, it checks for the first newline
character, then the second) and further down in the code, it checks to see
if the line has two newline characters and carefully removes them, one at a
time.

Yes. The code adds two characters, only to remove them later on.

Again, I can see the requirements late during the Reagan
Administration bumping up against the requirements during the early Bush 43
Adminstration and again, the easiest way to handle this is a local
change that distrubs as little code as possible.

Although, there is one bit that does smack of rabid howler
monkeys on crack taking a pass at the code, which I briefly mention in passing. It's basically
the Poster Child™ for why certain C programmers should be taken out
back behind the shed and disembowled with a grapefruit
spoon.

Back then, I was tasked with modifying some code to log the Protocol
Stack From Hell™ errors via syslog(), and all I had to
work with was a C source file:

(No, seriously, each function starts and ends with
MYSTERIOUS_something_CODE_WE_CANT_TOUCH()) and an object file,
which the C code is linked against to produce the final program.

Okay, nothing that out of the ordinary. Only we weren't getting
the proper error messages from the lrm_t … thingy … we were
given. Some back and forth with The Protocol Stack From Hell™
Technical Support® and we had the final solution, and if you can read C
code, prepared to be horrified:

For those not fluent in C, let me translate: “you will receive a block
of memory called prt, which has a particular layout we
laughingly call lrm_t. Ignore the data there, but instead,
what you actually want lies just past the block of memory you received, into
an area that Standard C calls “undefined behavior.” Abandon all hope ye who
program here. And have a nice day.”

Then, I was horrified. Now, I get to see the code from “the other
side” and “horrified” does not describe my reaction. “Running away,
screaming in sheer madness of having peered deep into the Abyss” would be a
bit closer, but still misses the mark. It goes something like this.

This describes the layout of the memory block we're given in
Stpd_Lrm_Fnctn(). It's not terribly big as structured memory
goes, maybe around 60 or 70 bytes, but it primarily contains useless
information as I found out.

It's a slightly larger block of memory, but notice the last field,
reserved. A comment in the code says that this area is for
“internal use only” and is around 24 bytes in size.

Keep in mind—this field is only 24 bytes in size. The size of the
block of memory we're given in Stpd_Lrm_Fnctn() is around 60 or
70, which is larger than 24. 24 is smaller than 60 and
70. This is important.

“256 bytes should be enough for anyone, right?”

This, on a system with gigabytes of memory.

This copies data out of the reserved field into a block
of memory of type lrm_t. Refer back to note 2. Notice how we
want to copy 60 or 70 bytes of information, but the field we're copying from
is only 24 bytes. This, my friends, is known as “undefined behavior” in
C.

Only, this is air quotation marks okay air quotation
marks because msg_t is technically the air quotation
marks header air quotation marks of a larger message and thus,
we can air quotation marks safely air quotation marks copy
memory past the end of the header.

My thinking here—the error codes originally fit into the space set
aside by reserved but grew over time, found out, but too much
code relied on this situation, so they're stuck with it.

Or, you know, rabid howler monkeys on crack.

I just hope that lrm_t doesn't ever exceed 256 bytes in
size.

I'm serious. I'm not making this up. The code actually
usesstrcpy().

strcpy() is bad because there is no checking to see if you
have overrun the space set aside to receive the copied string.

The use of this function should cause modern C compilers to bitch
mightily and stop compilation right then and there, and send the programmer
to jail, do not pass Go, do not collect $200.00. Especially if the
programmers are rabid howler monkeys on crack.

You're also in the land of ASCII specificness. Couldn't you make
that:

dest[size] = "0123456789ABCDEF"[value & 0x0f];

And then not be tied to ASCII? You could also then
switch out that array pointer if you wanted to get a mix of
uppercase, lower case depending on what you need.

-MYG

I initially reject the idea of doing this. My reasoning? The code
itself is already non-portable, being restricted to a Posix-like system. So what's
one more non-portable item on the list? The sequence if (dest[size] >
'9') dest[size] += 7 is around six (for a lot of architectures that
aren't RISC
based) to twelve bytes (RISC systems) in size, and now you want to add an
additional 16 bytes? [He asks, working from a system
with a few gigabytes of RAM
—Editor] [Shut up! –Sean]. Also, in my nearly 30 years of
working with computers, I've yet to come across a non-ASCII based computer
system.

Yes, there are a few. Baudot code perhaps
being the oldest and perhaps, the oddest one. Then there are the 6-bit
character encoding schemes and Radix-50, which pack
multiple 6-bit characters per “word” of storage (where a “word” could be
16, 18, 32, 36, 60 or 66 bits in size) and varied from system to system.
And let's not forget EBCDIC, one of about six nearly identical, but
maddendly different, encoding schemes developed by IBM. All of these were developed for
machines in the 60s, but ASCII won out in the end, being the most
widely used and at the core of Unicode.

So I asked on a mailing list of classic computer enthusiasts:

From

Sean Conner <spc@conman.org>

To

Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>

Subject

C compilers and non-ASCII systems

Date

Tue, 31 Jan 2012 11:21:02 -0500

A friend recently raised an issue with some code I wrote (a hex
dump routine) saying it depended upon ASCII and thus,
would break on non-ASCII based systems (and proposed
a solution, but that's beside the issue here). I wrote back, saying
the code in question was non-portable to begin with (since it
depended upon read() and write()—it was targetted at Posix based
systems) and besides, I've never encountered a non-ASCII system in the nearly 30 years I've been
using computers.

So now I'm wondering—besides Baudot, 6-bit BCD and EBCDIC, is there any other encoding scheme used?
And of Baudot, 6-bit BCD and EBCDIC, are there any systems
using those encoding schemes AND have a C compiler
available?

-spc (Or can I safely assume ASCII and derivatives
these days?)

I figure if anyone knew the answer, these people would (many of them not
only use computers like the PDP-10,
but use them as heaters during the winter months).

The answers were fascinating.

From

"Shoppa, Tim" <XXXXXXXXXXXXXXXXX>

To

Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>

Subject

Re: C compilers and non-ASCII systems

Date

Tue, 31 Jan 2012 13:18:55 -0500

IBM has a very handy page on C compatibility with EBCDIC system services:

Please consider other character codes. An EBCDIC port of GCC is alive and well on several of
the "legacy" operating systems (MVS, VM and Music) that run on the
Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs in
ASCII (or whatever it uses to get more than
256 points in a code page) many zLinux sites also have the zVM
hypervisor, which includes an optional EBCDIC C compiler.
Having ported the BREXX interpreter to this environment I was stung
by the fact that the original author had made assumptions about
character ordering that are not true on an EBCDIC
platform.

I figure I would then try Mark's suggestion (and several other people on
the mailing list suggested the same thing) and at least time the change to
see if it's a worthwhile change for such odd-looking, but legal, C code.

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go
ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or
entry, if there is only one entry). The titles are the permanent
links to that entry only. The format for the links are
simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are
interested in, say 2000/08/01,
so that would make the final URL:

You may also note subtle shading of the links and that's
intentional: the “closer” the link is (relative to the
page) the “brighter” it appears. It's an experiment in
using color shading to denote the distance a link is from here. If
you don't notice it, don't worry; it's not all that
important.

It is assumed that every brand name, slogan, corporate name,
symbol, design element, et cetera mentioned in these pages is a
protected and/or trademarked entity, the sole property of its
owner(s), and acknowledgement of this status is implied.