The installation went smoothly for me. Don’t be surprised if your hard-drive is busy for a about half an hour after you first boot up; that’s Leopard’s Spotlight search program indexing your hard drive.

I want to focus on some of the new features for developers, particularly the upgrade of the Objective-C programming language to version 2.0, and the upgrade of the XCode integrated development environment to version 3.0.

Objective-C is an object-oriented version of C, dating from the early 1980’s, which is a strict super-set of C; that means ordinary C programs will compile successfully under an Objective-C compiler. That makes Objective-C sound a lot like C++, but I like Objective-C a lot more than C++. C++ takes a “Swiss army chainsaw” approach, throwing many new features into the language, while Objective-C is much more minimalist, basically extending the language just to support objects with a Smalltalk-like syntax. Objective-C is also much more dynamic than C++; much more is decided at run-time rather than compile-time. Because of that, it feels a lot closer to programming with a nice scripting language like Python or Ruby. Here’s an excellent introduction to the Objective-C language by Apple; it’s a surprisingly literate piece of technical documentation.

Objective-C is almost never used by itself, instead you use it in conjunction with a set of extensive libraries (the Cocoa libraries on Mac OS X or the GNUstep libraries on Linux or Windows). Cocoa and GNUstep derive from the NeXTStep and OpenStep libraries developed by the NeXT Computer company in the 1980’s and 1990’s. They add both fundamental features (e.g. string handling features, hash tables, that type of thing) and GUI-creation features. These libraries have been under development for 20 years, so they are extraordinarily mature. And since Apple uses Cocoa and Objective-C to develop all of its applications, including Mac OS X itself, it is clear that if you want to develop desktop applications for the Macintosh, you need to learn about them.

I actually believe that using Objective-C and GNUstep is also a very reasonable choice on Linux (or Windows), for those types of applications where you would otherwise use C++, but few people actually make that choice. In fact, I have found that GNUstep and Cocoa are compatible enough that one can pretty easily maintain code that works on all platforms if you need that.

Apple provides a very nice integrated development environment called XCode for free with Mac OS X. If you have not upgraded to Leopard, you’ll be limited to XCode 2.5, while Leopard gives you XCode 3.0. One of the nicest parts of XCode is the “Interface Builder,” which lets you build GUI’s using a GUI instead of by writing code.

With Leopard, Objective-C is being upgraded to version 2.0. Perhaps the most important new feature included is garbage collection. It’s an opt-in system, so legacy code will still work, and you can turn it off if you like allocating and releasing memory yourself, but for new code, most developers will obviously be very happy to have it. This removes one of the major warts of the Objective-C language.

To learn more about building desktop applications for Mac OS X, I highly recommend “Cocoa Programming for Mac OS X” 2nd edition, by Aaron Hillegass. I only wish that there was a third edition that covered the new features of XCode, Objective-C and Cocoa that have appeared in Tiger and Leopard. (UPDATE: Apparently, a 3rd edition is scheduled for Spring 2008.)

If you’re interested in programming Macintosh applications, but prefer to use Ruby, Python, or Haskell instead of Objective-C, you should know about RubyCocoa, PyObjC, and HOC, which let you call the Cocoa libraries from those languages. These tools are very nice, (I’ve actually only used PyObjC personally) but you’ll still need to have some familiarity with Objective-C to understand them.

“It is necessary to present the science in the language of mathematics. Unfortunately, when we teach science we use the language of mathematics in the same way that we use our natural language. We depend upon a vast amount of shared knowledge and culture, and we only sketch an idea using mathematical idioms.”

The solution proposed is to develop notation that can be understood by computers, which do not tolerate ambiguity:

“One way to become aware of the precision required to unambiguously communicate a mathematical idea is to program it for a computer. Rather than using canned programs purely as an aid to visualization or numerical computation, we use computer programming in a functional style to encourage clear thinking. Programming forces one to be precise and formal, without being excessively rigorous. The computer does not tolerate vague descriptions or incomplete constructions. Thus the act of programming makes one keenly aware of one’s errors of reasoning or unsupported conclusions.”

Sussman and Wisdom then focus on one highly illuminating example, the Lagrange equations. These equations can be derived from the fundamental principle of least action. This principle tells you that if you have a classical system that begins in a configuration C1at time t1 and arrives at a configuration C2 at time t2, the path it traces out between t1 and t2 will be the one that is consistent with the initial and final configurations and minimizes the integral over time of the Lagrangian for the system, where the Lagrangian is given by the kinetic energy minus the potential energy.

Physics textbooks tell us that if we apply the calculus of variations to the integral of the Lagrangian (called the “action”) we can derive that the true path satisfies the Lagrange equations, which are traditionally written as:

Here L is the Lagrangian, t is the time, annd qi are the coordinates of the system.

These equations (and many others like them) have confused and bewildered generations of physics students. What is the problem? Well, there are all sorts of fundamental problems in interpreting these equations, detailed in Sussman and Wisdom’s paper. As they point out, basic assumptions like whether a coordinate and its derivative are independent variables are not consistent within the same equation. And shouldn’t this equation refer to the path somewhere, since the Lagrange equations are only correct for the true path? I’ll let you read Sussman and Wisdom’s full laundry list of problems yourself. But let’s turn to the psychological effects of using such equations:

“Though such statements (and derivations that depend upon them) seem very strange to students, they are told that if they think about them harder they will understand. So the student must either come to the conclusion that he/she is dumb and just accepts it, or that the derivation is correct, with some appropriate internal rationalization. Students often learn to carry out these manipulations without really understanding what they are doing.”

Is this true? I believe it certainly is (my wife agrees: she gave up on mathematics, even though she always received excellent grades, because she never felt she truly understood). The students who learn to successfully rationalize such ambiguous equations, and forget about the equations that they can’t understand at all, are the ones who might go on to be successful physicists. Here’s an example, from the review of Sussman and Wisdom’s book by Piet Hut, a very well-regarded physicist who is now a professor at the Institute for Advanced Studies:

“… I went through the library in search of books on the variational principle in classical mechanics. I found several heavy tomes, borrowed them all, and started on the one that looked most attractive. Alas, it didn’t take long for me to realize that there was quite a bit of hand-waving involved. There was no clear definition of the procedure used for computing path integrals, let alone for the operations of differentiating them in various ways, by using partial derivatives and/or using an ordinary derivative along a particular path. And when and why the end points of the various paths had to be considered fixed or open to variation also was unclear, contributing to the overall confusion.

Working through the canned exercises was not very difficult, and from an instrumental point of view, my book was quite clear, as long as the reader would stick to simple examples. But the ambiguity of the presentation frustrated me, and I started scanning through other, even more detailed books. Alas, nowhere did I find the clarity that I desired, and after a few months I simply gave up. Like generations of students before me, I reluctantly accepted the dictum that ‘you should not try to understand quantum mechanics, since that will lead you astray for doing physics’, and going even further, I also gave up trying to really understand classical mechanics! Psychological defense mechanisms turned my bitter sense of disappointment into a dull sense of disenchantment.”

Sussman and Wisdom do show how the ambiguous conventional notation can be replaced with unambiguous notation that can even be used to program a computer. Because it’s new, it will feel alien at first; the Lagrange equations look like this:

It’s worth learning Sussman and Wisdom’s notation for the clarity it ultimately provides. It’s even more important to learn to always strive for clear understanding.

One final point: although mathematicians do often use notation that is superior to physicists’, they shouldn’t feel too smug; Sussman and Wisdom had similar things to say about differential geometry in this paper.

“Artificial Intelligence: A Modern Approach,” by Stuart Russell (professor of computer science at UC Berkeley) and Peter Norvig (head of research at Google) is the best-known and most-used textbook about artificial intelligence, and for good reason; it’s a great book! The first edition of this book was my guide to the field when I was switching over from physics research to computer science.

I feel almost embarrassed to recommend it, because I suspect nearly everybody interested in AI already knows about it. So I’m going to tell you about a couple related resources that are maybe not as well-known.

First, there is the online code repository to the algorithms in the book, in Java, Python, and Lisp. Many of the algorithms are useful beyond AI, so you may find for example that the search or optimization algorithm that you are interested in has already been written for you. I personally have used the Python code, and it’s really model code from which you can learn good programming style.

In my previous post about simulating the Ising model with the Metropolis algorithm in NetLogo, I said that I would return and and give a walk-through of the amazingly succint NetLogo code. Actually, I’m not going to do that; NetLogo code is sufficiently readable, and the documentation is sufficiently comprehensive, that there’s no real point.

Instead I want to discuss to what extent NetLogo can be considered a “real” programming language, suitable for work beyond its roots in education. The short answer is that it looks to me like quite a competitive language, which will make a particularly excellent choice for many scientific applications.

NetLogo is optimized for simulations of agents moving in a two-dimensional space. The moving agents are called “turtles” but you can think of them as objects endowed with a lot of built-in methods. It’s quite possible to use the turtles in the same way as objects in other object-based languages (although only a limited form of inheritance is available). For example, a turtle can contain other turtles as variables, and you can create new classes (called “breeds”) of turtles. The other basic objects, with many built-in methods, are the “patches” which tile the 2-d space, the “links” that you can set up between turtles, and the “observer.”

There are an impressive number of built-in primitive procedures, especially for anything that relates to simulations. You can also do all the basic things that you would expect a language to do: open and write to files, process strings, work with lists, etc.

The ability to quickly and easily build a GUI that will work on all platforms is very attractive. I have some experience building GUI’s for Mac OS X, using Cocoa and/or the Python bridge PyObjC (which is another worthwhile approach and something that I’ll post about at some point), and I can say that to build essentially the same simulation with the same GUI in NetLogo takes easily less than half as much work and code. It is also nice that it is so easy to construct applets and movies.

The syntax is similar to Lisp, but without parentheses, and with a great deal of syntactic sugar to make it look as close to English as possible. It is absolutely an optimal first language for the beginning programmer. My son was amazed that “everything worked, and when it didn’t I could understand the error messages.” He’s not really used to that from his experiences with other languages.

I looked for things that are missing. At first I thought hash tables weren’t there, but those are actually available through an extension. One thing that really is missing is the ability to treat a function as a value. You also can’t define your own special forms or macros, so I suppose that it’s not really Lisp. NetLogo also does not have, aside from in its core area of simulation, much in the way of libraries, but there is the ability to extend the language by writing functions using Java.

It’s easy to learn; but don’t neglect to look at the code examples section of the models library (you’ll need to download NetLogo first). You’ll see how easy it is to do things that take a lot more work in most languages.

I’ve already posted about NetLogo, but I want to return to it, because I’ve been very impressed with how ridiculously easy it is to construct sophisticated simulations of interesting models with it. My son Adam has been having a lot of fun with it, and he is able in the space of an hour or two, to create simulations of interesting models of his own devising, complete with easy-to-use GUI’s.

To illustrate how easy it is to build models, I’m going to use a familiar model that comes in the NetLogo library: the Ising Model of ferromagnetism on a square lattice, simulated using the Metropolis algorithm. (I simplified the code very slightly for clarity.)

In this model, there are spins at each node of a square lattice, that can point “up” or “down.” Spins like to have their neighbors point in the same direction. To compute the energy of a configuration of spins, we look at all pairs of neighboring spins, and add an energy of -1 if the two spins point in the same direction, and an energy of +1 if the two spins point in opposite directions. Boltzmann’s law tells us that each configuration should have a probability proportional to the exp(-Energy[configuration] / T), where T is the temperature.

The Metropolis algorithm is an algorithm for dynamically generating configurations of the model with the correct probability. One starts at some configuration, and picks a spin at random, and considers flipping it. If flipping it would reduce the energy or leave the energy unchanged, one makes the flip. Otherwise, one flips the spin anyways with a probability of exp(-Ediff/T), where Ediff is the amount that the energy will be increased by making the flip. Then pick another spin at random and continue.

It is not immediately obvious that the Metropolis algorithm generates configurations with probability given by Boltzmann’s Law, but one can prove that it does. The Metropolis algorithm is not the most efficient algorithm for generating samples from the correct probability distribution; much more efficient algorithms from that point of view are discussed in Werner Krauth’s book, which I previously reviewed.

There are many things to say about the Ising Model, but first let’s look at the NetLogo code for this model. It’s extremely short; the code (including the code controlling the GUI) is about the same length as the copyright notice that I’m appending because I only made small modifications to Uri Wilensky‘s code:

In a future post (Edit: it’s here), I’ll step through this code and show you how it works (you do also need to click on a few buttons to set up the GUI). For now, though, I want to show you what an applet running this code looks like, and make some more comments about the Ising model that are best illustrated by running the applet. So please (assuming you have Java installed; and if you’re on Mac OS X Leopard, use Safari, there’s some problem when using Firefox) CONTINUE HERE.

Along with using version control, another absolute key to developing reliable software is to systematically test your code as you write it. After all, source code needs to be bug-free to function properly, but all human beings generate bugs at a very high rate when writing code.

Fortunately, Python makes testing remarkably easy and convenient with its doctest module, which lets you put your tests right into your doc strings.

An advantage of using doctest is that your doc strings will serve as examples, as well as tests, of the functions that they document. Examples are often the best kind of documentation for a function.

In fact, I find that if a function doc string explains the inputs to the function, the variable(s) returned by the function, and any side effects, along with the doctest examples, then there is rarely any need for other comments.

My favorite way to develop python code is actually within Emacs. I write a test for a function, then write the function itself, and then type Control-C Control-C in Emacs. Control-C Control-C will execute the python code. If your code is set up to run a _test() function like the code above, then Emacs will open up another buffer which will contain any doctest failures. When all the tests pass, I finish up the documentation of the inputs, outputs, and side effects. That way you can systematically build up your software, one reliable and documented function at a time, while never leaving Emacs.

CPU’s stopped getting faster about five years ago. Since then, Intel and AMD started introducing multi-core processors, and the trend for the foreseeable future seems to be more and more processors per computer. That will be great, so long as the software industry is able to take advantage of the increased number of processors to deliver bigger and faster applications. However, that means parallel programming is in our future, and parallel programming is hard.

What’s more, the current dominant paradigm to take advantage of multiprocessor systems is threads, and threads appear to have very serious problems. Berkeley professor Edward Lee argues in no uncertain terms that the inherent non-determinism of threads programming dooms any software testing approach to failure, and will lead to buggy unreliable programs:

“A folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs.”

“These same computer vendors are advocating more multi-threaded programming, so that there is concurrency that can exploit the parallelism they would like to sell us. Intel, for example, has embarked on an active campaign to get leading computer science academic programs to put more emphasis on multi-threaded programming. If they are successful, and the next generation of programmers makes more intensive use of multithreading, then the next generation of computers will become nearly unusable.”

Having read Lee’s paper, I was very interested when I learned of Joe Armstrong’s new book “Programming Erlang: Software for a Concurrent World.” Armstrong actually identifies a problem with threads that is related to but slightly different than non-determinism: the fact that different threads can access the same memory. Why is shared memory a big problem? Briefly, because a thread or process that needs to access shared memory must lock it, and if it crashes while the memory is locked, you’re in trouble. For more, see this post.

So what is Erlang, and what is it like? Well, it’s open-source, and has been developed and used in telecom companies for a long time, so there already exist extensive libraries. It is a general-purpose language designed for concurrent, distributed, and fault-tolerant applications. It adheres strongly to the functional programming paradigm. It is a dynamic language, comes with a shell, and uses pattern-matching extensively. In Erlang, it is easy to spawn very large numbers of very-lightweight processes. Processes communicate using messages, and do not share any memory. All in all, it’s a very funky language.

So far, I have only read a small portion of Armstrong’s book (through chapter 8, where concurrent programs are introduced), but it is already clear to me that this is a significant piece of work, well worth the time spent with it. Starting from scratch, Armstrong works his way up to explaining complex distributed and fault-tolerant applications, such as a streaming media server, that require surprisingly little code. I plan to say more in future blog posts, as I progress through the book. In the meantime, here’s a list of beginner Erlang links.

(By the way, in his paper quoted above, Edward Lee discusses Erlang briefly, and says that he believes that its unfamiliar syntax will continue to block its wide-spread adoption. I’m not so sure–perhaps Erlang has simply not broken through until now because there just wasn’t much need for a language suited for multi-processor programming.)

This set of videos, of Gerald Jay Sussman and Hal Abelson teaching their course on the “Structure and Interpretation of Computer Programs” in July 1986 for Hewlett-Packard employees, is one of the treasures of the internet. The clothes are out of style, but the material presented is still completely relevant.

The SICP web-site has lots of useful additional information, including the complete text of the 2nd edition of their classic textbook.

But I will add one comment: what I love most about these lectures is the point (see the picture above of Sussman in his wizard hat) that a computer programmer is like a wizard–he creates something real out of ideas. Computers let us be wizards; and I believe we have only scratched the surface of the possible “spells” that we can learn to cast.

If you want a nice implementation of Scheme, the beautiful dialect of Lisp that was invented by Sussman with his student Guy Steele, and used in this textbook, I highly recommend DrScheme.

Much of my own work is at the intersection of statistical mechanics and algorithms, in particular understanding and developing new algorithms using ideas originating in statistical mechanics. Werner Krauth also works at the intersection of the two fields, but coming from a very different angle: he is a leading expert on the development and application of algorithms to compute and understand the properties of physical systems.

In his recently published book, “Statistical Mechanics: Algorithms and Computations,” targeted at advanced undergraduates or graduate students, he covers a very wide range of interesting algorithms. To give you an idea of the coverage, I’ll just list the chapters: “Monte Carlo methods,” “Hard disks and spheres,” “Density matrices and path integrals,” “Bosons,” “Order and disorder in spin systems, “Entropic forces,”and “Dynamic Monte Carlo methods.”

Krauth’s presentation is leavened by his humor, and he often uses the results obtained using his algorithms to make surprising points about physics that would otherwise be hard to convey.

I am often asked by computer science or electrical engineering scientists and researchers for good introductions to physics, and particularly statistical mechanics, and I’m now happy to be able to recommend this book.

“Computational algorithms are used to communicate precisely some of the methods used in the analysis of dynamical phenomena. Expressing the methods of variational mechanics in a computer language forces them to be unambiguous and computationally effective. Computation requires us to be precise about the representation of mechanical and geometric notions as computational objects and permits us to represent explicitly the algorithms for manipulating these objects. Also, once formalized as a procedure, a mathematical idea becomes a tool that can be used directly to compute results.”

But while Sussman and Wisdom’s book focuses in great detail on classical mechanics, Krauth’s book covers more broadly subjects in classical mechanics, statistical mechanics, quantum mechanics, and even quantum statistical mechanics. Another difference is that Sussman and Wisdom specify their algorithms in executable Scheme code, while Krauth uses pseudo-code. Of course, both choices have their advantages, just as both of these books are worth your time.

If we want to construct artificial machines that rival the capabilities of biological organisms, we should try to understand the principles by which complex natural “machines” such as plants and animals are created.

It is generally agreed, at least by scientists, that all natural organisms have been “designed” by the completely blind and random process of evolution. Through evolution, a population of organisms tends to become progressively better adapted to its environment via the mutation of genomes of individuals in the population, and the selection and more rapid reproduction of the fittest organisms in that population.

My own interest is not so much in analyzing evolution, but in exploiting it. If we understand evolution so well, shouldn’t we be able to use it to design useful machines?

Of course, humans have already for many centuries exploited evolution, using artificial selection to breed domesticated animals or cultivate useful plants.

But I am looking for something else: the design of artificial machines through artificial selection. Although it has never been a mainstream idea, computer scientists have pursued such dreams since the 1950’s. When I was in graduate school in the 1980’s, I loved reading John Holland’s seminal 1975 book “Adaptation in Natural and Artificial Systems.”

Holland and his students were deeply influential in popularizing the whole field of genetic algorithms.

Another important figure in the field is John Koza, who has advocated for many years one of the most important variants of genetic algorithms, which he calls “genetic programming.” In genetic programming, computer programs, typically written in Lisp, are evolved through a process that involves mutating the programs by altering or swapping branches of the computation tree representing the program.

Genetic programming and genetic algorithms more generally, have had considerable success creating interesting and useful systems and programs. Nevertheless, I think it is fair to say that these ideas are still considered “fringe” ideas in the scientific and engineering community, and they have not widely replaced more conventional software and hardware design strategies.