Computational Complexity and other fun stuff in math and computer science from Lance Fortnow and Bill Gasarch

Thursday, June 23, 2011

Creating an Email System at Cornell

Email celebrates its fortieth anniversary so let me tell the story of my job for three summers, and part-time during the academic year, while an undergrad at Cornell University: Creating an email system from scratch.

In my sophomore year (1982) I took an computer structure course. I had a heavy set of final exams and papers so I did the final program for this course early and turned it in the last day of class to the instructor, Steve Worona. In that class you could scale assignments and tests from 0.75 to 1.5 to make them count more or less. When I turned it in, Worona asked me why, if I'm turning it in a week early, did I scale it at 0.75? "You never give me A+'s on the programs and I didn't want to lower my grade."

That was perhaps my most obnoxious moment but it got me noticed and Worona, who worked for computer services, offered me a programming job. We would create a new email system for Cornell. Cornell had an email system written in some scripting language, slow and clunky. We wouldn't use any fancy high-level language, we would code directly in IBM 370 assembly language. We would do it all ourselves, user interface, database for storing messages, interactions with SMTP servers, etc to maximize efficiency. No small task which is why it took me nearly three years.

IBM Assembly language was quite bloated with instructions. There was a command called "Edit and Mark" that went through a range data making modifications based on some other data. This was a single assembly language instruction. We used to joke that there was a single instruction to do your taxes.

Cornell at the time was a gateway between BITNET ("Because It's Time NETwork", connecting about 30 universities in US and Europe) and a fledgling ARPANET, the precursor to the Internet. BITNET worked with files, ARPANET one line at a time so there was a special file-based Batch SMTP to transmit email between the two. The fun I had working this all out.

As a test bed, my email system was used in only one building, Day Hall, which held the university administration: President, Provost etc. Great pressure to make sure there were no bugs.

One day a company that helps get people green cards sent an email to everyone on BITNET. My first piece of spam.

As a side project I helped write an ARPANET interface into CUINFO, an early electronic information system at Cornell. That was pretty simple, we just used the Telnet interface into a different port. This is basically what HTTP does now. I could have invented the Web!

In my senior year I told Steve Worona that I was planning to go to graduate school in theoretical computer science.

SNDMSG was not really e-mail and if you want to say that it was, there had already been e-mail before, it was just single-machine e-mail (even Ray Tomlinson notes that he was making improvements to the existing SNDMSG program when he writes about his work from 1971).

370 assembler - matched in COBOL and PL/I - had an surprising array of data-types for numbers.I recall that they included the char format EBCDIC (IBM's extended binary-coded decimal standard that predated ASCII which wasn't always the standard it is today), packed decimal, zoned decimal, etc.

COBOL had some other oddities (other than verbosity) and self-modifying code (which I don't recall seeing used) that made it painful indeed to debug. In addition to having PERFORM commands for a paragraph of statements, which were used like macros instead of subroutines and whose start was a line label and whose end was a period (used more when GOTOs were deprecated), COBOL had PERFORM THRU commands. A PERFORM THRU command was just as insidious as self-modifying code. It would start execution at some line label and keep executing until another specified line label was reached. This line label could be anywhere, even in the middle of a paragraph. At least with a GOTO statement you knew when you were leaving the current flow of execution. With a PERFORM THRU there is no indication in the terminating line that control will move to an earlier point in the code. I have seen it used in 100,000 line COBOL programs.

Y2K as a threat might have been overblown BUT at least it allowed companies to wash away years of COBOL programs. I wonder how many people still have to maintain old COBOL programs.

The answer is "yes" to both of these questions, but for reasons other than speed. One can safely assume that most people out there cannot write assembly which will produce a binary that is faster than one produced by an optimizing compiler.

The real reason one would want to code in assembly (and machine code) would be to achieve access to the underlying hardware that a high-level language compiler would prohibit you from doing, or if you're trying to do something squirrelly (e.g., shell-coding).