Category: Programming

If you are a UNIX/Linux user of any kind, I highly recommend reading Eric S. Raymond’sThe Art of UNIX Programming. While I was expecting his book to be fairly one-sided about the virtues of UNIX and the shortcomings of everything else, I found the entire book to be well-balanced, informative, and very readable. TAUP could easily be used as a text for a programming class, but it’s really more of a “philosophy of programming” book. This book is filled with tons of quotable material, so I will resist the urge to make a quote-fest of this book review…

Raymond begins with a solid chapter on the Philosophy behind UNIX. He provides a number of great guiding principles for developing smart, streamlined applications. Where he could be abstract and vague, he provides concrete, usable advice. One of the great things about the book is his ample use of quotations from other UNIX gurus. From Doug McIlroy‘s A Quarter Century of Unix:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

He also quotes Rob Pike on his 6 rules of C programming. I found the last two to be especially poignant:

Rule 5. Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Rule 6. There is no Rule 6.

Raymond goes on to compare and contrast the major operating systems of the past 20 years, including VMS, MacOS, OS/2, Windows, BeOS, MVS, VM/CMS, and Linux. He provides a framework and vocabulary for discussing OSes. Concepts like multitasking, process-spawning, file formats (e.g. textual vs. binary), and internal system boundaries differ greatly from OS to OS and have profound effects on the end user experience. Again, Raymond covers fairly academic material, making rigorous arguments using carefully defined terms. But he presents the material in a very novel and personal way, making the book feel less like a textbook and more like a pleasure read.

Once we understand the type of development environments various OSes can offer us, Raymond begins to describe criteria which we can use to set quality software apart from average software, like compactness and orthogonality. Compact software has the property that it can fit conceptually inside a human’s head. Orthogonal software has the property that operations are atomic and have no unexpected side effects; they change one thing without affecting others. Often, the concepts for good software design are best-illustrated by example, and Raymond provides many such examples to drive each point home. One of my favorite things about this material is that it has changed my thinking about design/architectural elements in my own programs.

Again, I’ll avoid the urge to summarize every chapter, as much as I’d like to. There is something to be said for letting the next reader get as much out of the book as I did. There are, however, a couple other sections of the book that I felt were compelling and deserve discussion.

One of the more enjoyable sections of the book for me was that on Unix Interface Design Patterns. Those of us who have taken a software engineering class may already have been exposed to some of the popular design patterns, so most of the vocabulary is familiar. Raymond describes patterns like The Filter Pattern, of which grep is a good example. Among other patterns, he also describes The Cantrip Pattern, The Sink Pattern, and The Compiler Pattern. The Cantrip Pattern is the simplest Unix design pattern as it has no inputs or outputs; its behavior is only affected by startup conditions. Classic examples of the Cantrip Pattern include rm and touch. Raymond reminds us to resist the temptation to write more interactive programs when a Cantrip will suffice because the program will lose scriptability.

In a further discussion of the software complexity idea, Raymond uses his “Tale of Five Editors” to explore the trade-offs between interface and implementation complexity. The five editors he compares and contrasts are: ed, vi, Sam, Emacs (my personal preference), and Wily, an editor I had neither used nor heard of. Raymond gives accurate, matter-of-fact descriptions of each of the editors and shows how . When ultimately arriving at the question of the Right Size for Software, Raymond leaves us with the Rule of Parsimony: “Write a big program only when it is clear by demonstration that nothing else will do.”

For every interesting part of the book that I mentioned in this review, there are 5 or 6 that are equally thought-provoking and affirming to those of us that embrace the *nix culture. I can’t recommend this book any stronger to any hacker or fan of Unix.

I am definitely open to discussion on this topic as it’s one that’s close to my heart. If you’re a Perl advocate or would otherwise like to make a case against the use of Python, I strongly encourage you to send me an email. I am more than happy to try to defend it 🙂

This code no longer works with the current eBay API, but I’ll leave it posted for reference.

eBay has some very cool functionality that allows application programmers to make API calls on a user’s behalf without using their site credentials. I decided to investigate the eBay API a bit and wrote a function that allows web developers to print the items listed in an eBay buyer’s My eBay section on third party web pages. Feel free to check out the code.

The eBay Developer Program has a guide to help developers begin using the API. You can make all kinds of unauthenticated function calls to the API, like listing search results, without using eBay’s Auth and Auth (Authentication and Authorization) system. But to make calls on behalf of a particular user, you will need to generate an auth token. This can be done using eBay’s Authentication Token Tool.

Essentially, this process works by entering an eBay user’s site crentials in the token generator and eBay returns a cryptographic token. This token can then be passed along with the API function calls to authenticate the caller without including the user’s login and password.

The function I wrote is essentially a wrapper to the API call GetMyeBayBuying. Calling my function, printMyEbay, will print the auctions that a user is a) watching, b) bidding on, and c) has won.

PyRSA is a command line utility that allows users to digitally encrypt and sign messages using the public key encryption scheme, RSA. There are three basic functions that PyRSA performs: encryption, decryption, and key generation.

1. Generate a public and private key. In this example, we will specify a key of length 1024 bits. Allow several seconds of CPU time for the generation of the keys.

pyrsa.py -g 1024 Enter file identifier (i.e. first name): brandon

2. Now the files

brandon_privateKey.txt

and

brandon_publicKey.txt

are in the current directory. Next place the text we want to encrypt in a text file.

echo "The sky above the port was the color of television, tuned
to a dead channel." > message.txt

3. Encrypt the message using the public key and redirect the output to a text file.

pyrsa.py -e message.txt -k brandon_publicKey.txt > ciphertext.txt

4. At this point the file ciphertext.txt contains the encrypted message. The file can safely be sent to a recipient, i.e. as an email attachment, the contents utterly unreadable to anyone without the private key.

Overview

We have spent the last several weeks learning about encryption in my computer security class so I thought I’d share what I’ve learned on public key cryptography.

There is a very good description of RSA on Wikipedia, so I don’t want to simply restate what they have. The focus here will be the generation of public and private keys as I feel many of the RSA tutorials on the web are lacking a bit in that department. Computing the multiplicative inverse to get d from e is a little tricky, but we will walk through it step-by-step.

First, a brief overview of RSA, for those not familiar with it already. A message M is encrypted by raising it to the power of e and then taking the result modulo some number N. To decrypt the message, you simply raise the value of the encrypted message C to the power of d and again mod by N. The beauty of RSA is that e and N can be published publicly. Together they, in fact, comprise the public key. The private key, which is not be published, is comprised of d and N.

C = Me mod N M = Cd mod N

If you’re like me, then you are astonished at 1) how simple this system is, and 2) that you can exponentiate messages twice (modulo some number) and leave the original message unaltered. The main question that my skeptical mind came up with when presented with this powerful encryption tool was, “wouldn’t it be easy to compute d if you have the values of e and N?” The answer is, of course, no. It turns out that it is very hard to do so. We shall see later that it is easy to compute d only when we have the factors of N. If we choose N to be arbitrarily large, factoring N can take an arbitrarily long period of time. Currently, there are no known polynomial-time algorithms which can perform this task. Factorization has, in fact, been shown to be in the set of problems known as NP. So the security of RSA is essentially provided by the hardness of the factorization problem. If someone figures out a way to factor large numbers fast, then RSA is out of business.

Key Generation

As was mentioned above, RSA’s security is rooted in the fact that N is hard to factor. Therefore, we should choose N to be the product of two large primes, p and q. For clarity in this example, we will choose relatively small values for p and q, but later we will discuss the proper choices for these coefficients given a desired level of security.

For this example, let P = 647 and Q = 1871. This means that the modulus, N = 1210537. (Incidentally, factoring this value of N took 0.056 seconds on UCR’s mainframe).

Now we choose a number e which should be coprime to φ(N). The easiest way to do this is to simply choose a prime number. For this example, let e = 1127.

The next step is to compute d such that (d * e) mod φ(N) = 1. If this is confusing, that is okay. This property is important because it ensures that (Me)d (mod n) = M. It may help to have a look at Euler’s Theorem if you are still confused.

The best way to compute the multiplicative inverse, d from e and φ(N) is to use the Extended Euclidean Algorithm. Here is Euclid’s algorithm for our example:

1127

1208020

(1, 0)

(0, 1)

We start with unit vectors (1, 0) and (0, 1) which correspond to the values of e and φ(N), respectively.

For each operation we perform on the left two columns, we perform the same operation on the right two columns.

For example, in the first step, 1127 divides 1208020 1071 times and leaves a remainder of 1003. The corresponding operation in columns 3 and 4 is to subtract (1, 0) from (0, 1) 1071 times yielding (-1071, 1).

The algorithm terminates when we have 1 and 0, not necessarilly in that order, in the first two columns. The value for d is in the column that corresponds to the 1 in the first two columns.

*Note: it is worth mentioning that it is possible for the extended Euclidean algorithm to yield a negative result for d. Obviously, this is not a suitable decryption exponent because raising an integer to a negative number results in a fraction. The simple fix here is to mod the negative value of d by φ(N), giving us a positive value of d between 0 and φ(N).

1127

1003

(1, 0)

(-1071, 1)

124

1003

(1072, -1)

(-1071, 1)

124

11

(1072, -1)

(-9647, 9)

3

11

(107189, -100)

(-9647, 9)

3

2

(107189, -100)

(-331214, 309)

1

2

(438403, -409)

(-331214, 309)

1

0

(438403, -409)

(-1208020, 1127)

From the above calculations we know that d = 438403. So we have both the public and private keys for this user:

public key = (1127, 1210537)
private key = (438403, 1210537)

To prove that this system works, observe the following computations. Let our message M = 247. The first step is to compute C = 2471127 mod 1210537.

A brief aside:This exponentiation can be computed easily because we are using relatively small values for e and d. However, real world implementations of RSA often use 1024 bit encryption, meaning the exponent is 1024 bits long. That is roughly equivalent to a 300 decimal digit number. To compute an exponent of that order of magnitude in the conventional way, multiplying the base by itself e times would be prohibitively expensive. Even if we could compute 1 billion multiplications per second, the computation would take longer than the current age of the universe. So it is useful to use an alternative method like exponentiation by squaring. Here is a script that computes large exponents fast. Another consideration is the storage of a very large number such as Cd. Rather than keeping the value in main memory as we exponentiate, we can simply keep the value modulo N. And now back to our example…

2471127 mod 1210537 = 611545. This number was easily obtained with the Python interpreter in a fraction of a second. Raising this number, however to the value of d, 438403, should not be done the conventional way. On the school’s mainframe this calculation took 11 minutes, 23.65 seconds. This is a situation where we can see the power of divide-and-conquer algorithms. Using our recursive exponentiation function we show that 611545438403 mod 1210537 = 247. Voilá, out pops our original message. Additionally, the exponentiation took only 31.16 seconds on the same machine with the repeated squaring method. This can be vastly improved, too, once we develop a non-recursive function. That will be critical when we want to provide real security via RSA and we don’t want to wait 10 minutes to decrypt the message.