There's probably a better way to do this, but I have been having a difficult time trying to, from the lisp side of things, track down the cause of errors signaled from java code.

It turns out that we can use lisp's normal error handling facilities to work with java errors. The following snippet triggers a java NullPointerException and if we just evaluate this in SLIME we don't actually see the java backtrace (or at least I don't see it -- of course it would be nice if there were a way to do so).

Fri, 17 Jan 2014 00:32:37 GMT

Well, it had been a while since hunchentoot-cgi had seen any attention. It turns out that the initial releases of hunchentoot-cgi had a pretty major limitation -- it didn't work at all with POST request methods, or at least it didn't send along any of the request's data to the CGI process. This has no been fixed, along with a bunch of other bugs in setting up the CGI script's environment variables.

Sun, 05 Jan 2014 04:05:19 GMT

Tales of Woe

So... in an attempt to use preexisting wheels, rather than reinvent my own at every turn, I've been trying to get a decent Common Lisp environment working with the CDK (Chemistry Development Kit). My abcl-cdk adventures actually went reasonably well and I was able, eventually, to get ABCL talking nicely to CDK. Of course I wanted more than just that, I wanted interoperability between the CDK and my half-round wheel, chemicl, a cheminformatics package I started writing in Common Lisp. This is where the train began to fall of the tracks.

ABCL and cxml-stp

A while back, in an earlier, aborted attempt to get some of my chem/bioinformatics(https://github.com/slyrus/cl-bio) stuff working with ABCL I noticed that plexippus-xpath couldn't be loaded into ABCL. This was fixed, so I was encouraged that things might work with ABCL. (While I'm on a rant, the ABCL trac issue tracker is really slow...). However, cxml-stp seems to break ABCL.

Hopefully this is a fixable bug and some future version of ABCL will work with cxml-stp.

In the meantime...

SBCL and Java

So, I figured I'd try some other approaches to getting Java and a Common Lisp implementation to play nice. I know, you're thinking "why doesn't the dude just use clojure? After all, that's what clojure was designed for!" Well, that's a good question. I did use clojure for some earlier explorations with CDK and, while the java integration generally works well, I have a bunch of existing Common Lisp code I'd like to use and, at the time at least, it seemed like all of the clojure wrappers where thin wrappers around ugly Java libraries. I've grown to know and love many Common Lisp libraries, many of which are nicely available in QuickLisp, and I'd like to be able to use those (things like cxml-stp, plexippus-xpath, opticl, etc...).

Anyway, I tried to get some sort of SBCL Java interoperability working. Three possibilities appeared: 1) jfli, 2) foil and 3) cl+j. Turns out jfli is (was?) Rich Hickey's pre-clojure Common Lisp. I'm guessing that the challenges in getting jfli to work with any of reasonably Common Lisp implementations was part of the motivation behind clojure. In any event, it doesn't seem that jfli works under SBCL.

Next, I looked at foil, which appears to use sockets to communicate to another process running a JVM. This sounded suboptimal but, presumably, workable. Turns out foil looks like some sort of windows-only beast with a bunch of C# files. Not for me.

Finally, I looked at cl+j and it turns out there are some scary warning messages about how cl+j can't possibly work with SBCL's foreign threads handling mechanism. Bummer. This seems somewhat unreasonable on SBCL's part. Surely some amount of engineering should make it possible to have both a JVM and SBCLs runtime running in the same process. Unfortunately, I'm too out of practice with SBCL internals to give this much of a go at this point. Bummer again.

CCL and Java Ok, next approach. How about cl+j and Clozure Common Lisp (CCL)? Seemed reasonable, but, unfortunately, hung just like SBCL did. Presumably this is more of a MacOS issue than a CCL issue, as cl+j is supposed to work with CCL, but maybe just on other non-mac platforms.

Now what?

So, it seems I'm stuck without a viable approach to using the common lisp libraries I want and the java libraries I want in the same process. Perhaps the ABCL bug will get fixed. Perhaps JVM integration would make a good summer project for the next SBCL Summer of Code.

Tue, 31 Dec 2013 22:57:46 GMT

ticagrelor

The drug ticagrelor (marketed as Brilinta by AstraZeneca) is an inhibitor of platelet activation and aggregation that has been shown to reduce the frequency of cardiovascular events in patients with acute coronary syndrome.

An update on using the Chemistry Development Kit (CDK) with ABCL, Part 2

Rendering Stereochemical Molecules

You may recall that in my original blog post on using CDK with ABCL I had an example for reading a description of a molecule (a SMILES string) and rendering a picture of the 2-d structure of the molecule. Let's take another look at this process and see where things went awry and how they have gotten better.

The following line reads in a description of the amino acid valine, creates returns a new AtomContainer object:

So far so good. But the problem is that valine actually comes in two forms that are mirror images of each either. Think a left-handed version, l-valine, and a right-handed version, d-valine. The central carbon atom in valine has four neighbors, two carbons (which are functionally distinct as they themselves have distinct neighbors), a nitrogen, and a hydrogen. These four neighbors are arranged in a tetrahedral configuration and can be arranged in two distinct non-superimposable configurations, giving rise to a tetrahedral chiral center. A given chiral molecule and its mirror image are known as enantiomers.

Let's assume that we're really interested in the biologically important enantiomer, l-valine. Fortunately the SMILES spec has support for representing this information and we can write (and read) l-valline as:

Notice that the bond connecting the carbon in the middle of the molecule and the nitrogen is now a solid wedged bond (indicating that the bond is going up and that the nitrogen should be considered as being above the plane created by the bonds carbon-carbon bonds.

Explicit configurations around double bonds

In addition to the tetrahedral chiral centers mentioned, another important class of stereochemistry is the configurations around double bonds. For a simple example, let's consider the molecule 2-butene, or as it is known by its IUPAC name, but-2-ene.

Notice that the two single bonds are shown as going in opposite directions from the atoms involved in the double bond in the middle. But this is really just an accident. We didn't explicitly specify the stereochemical configuration. The convention for describing configurations around double bonds is known as the E/Z notation. If we want to ensure that the two terminal carbons are on the same side of the double bond (represented by Z (short for zusammen, which supposedly means together in German)), we can read an appropriate so-called chiral SMILES string (I say so-called because we're actually describing the stereochemistry of explicit configuration around a double bond, not a chiral center, but the SMILES folks play fast and loose with the nomenclature):

Now we see that the two terminal carbons are indeed on the same side of the double bond between the two internal carbons, and that when we draw an explicit configuration around a double bond the otherwise implicit hydrogens are shown in their proper position. Another hooray for CDK 1.5.4!

While we're at it, notice that we have explicitly provided width and height arguments to abcl-cdk:mol-to-svg in the previous two examples. The CDK 2-d rendering code requires some dimension arguments that seem to affect the size of things like bonds and atom symbols. It's not entirely clear what the best way to figure out what parameters should be used to display a given molecule at a given size, so we'll use some combination of (hopefully) lucky guesses and trial and error. 128x128 seems to look good for small molecules like the various flavors of butene.

Support for tetrahedral chiral centers and explicit stereochemical configuration around double bonds is a big win for CDK. Many thanks to John May and the rest of the CDK team for including this in the latest release. We'll look at some more complicated examples and additional features of abcl-cdk in the next installment.

An update on using the Chemistry Development Kit (CDK) with ABCL

Last year I explored using the CDK with ABCL. It was nice to see that ABCL could call out to the CDK and that I could use a Common Lisp environment for dealing with various kinds of chemistry data, molecules, atoms, bonds, etc...

The seemingly straightforward use-case I had in mind was to be able to read and write descriptions of molecules and to render these as 2-d drawings in various ways. This sort of worked, when I tried to work with more complex molecules, particularly molecules with explitic stereochemistry such as tetrahedral chiral centers or explicit configurations around double bonds, things broke down. I'm pleased to report that things have gotten much better in the past year or so!

First, the preliminaries. The canonical home for the cdk source code has for some time been somewhat difficult to track down, or, rather, I should say it's hard to know which particular version of the source code is the canonical version at any given time. But it does seem like https://github.com/cdk/cdk is the current canonical location. Unfortunately, the good folks at cloudera seem to have grabbed the top-ranking google spot for CDK with the Cloudera Development Kit. As awesome as the cloudera folks are, that's not what we're after. And the second hit on google is for Egon Willighagen's personal CDK repository, which is pretty damn close to the canonical repository these days, but I think https://github.com/cdk/cdk is actually the preferred place to grab the source at any given point in time.

So, now we're good to go with either the 1.5.4 release or, at least for the moment, the current HEAD of the master branch which will presumably one day become CDK 1.5.5.

Getting started with CDK

git clone http://github.com/cdk/cdk.git
cd cdk

If we want to use version 1.5.4 we can either hunt it down from some maven repository, which I generally hate doing, or build our own:

git checkout cdk-1.5.4
ant dist-large

Note that we need to make sure that ant builds the dist-large target as we want all of the CDK files to be rolled into one jar. We could use the individual jars but that would be a lot more work.

Now that we have the jar, I'm going to hold my nose and suggest that we use maven for the installation of the jar and then rely on ABCL's ASDF extensions that interact with maven to access the required jar files. Certainly other approaches could work too, but this one seems simple enough. In order to install the CDK jar using maven we can do the following:

Note that we change the name of the artifact to cdk-git here. We do this because (recent versions of) ASDF only accepts dotted integers for versions, so we can't request :version "1.5.5-git". Therefore we change the name of the artifact and use cdk-git for devlopment versions and cdk for release versions.

So now if we want to use the work-in-progress 1.5.5 git HEAD version we have to change the line in the ASDF system definition to:

(:mvn "org.openscience.cdk/cdk-git" :version "1.5.5")

Both versions should suffice for the following examples. I'm going to assume we're using the 1.5.5-git version from here on out.

Sat, 31 Mar 2012 20:20:22 GMT

So in the last installment, we saw a few problems with ABCL, maven and libraries to be supplied by maven. I've tracked a few of these things down, learned a few things, and released a trivial new library.

Maven 3.0.3 vs 3.0.4

It turns out I had maven 3.0.3 installed. I'm not sure where this came from. XCode perhaps? In any event, the ABCL maven stuff requires version 3.0.3 or later, so I was OK there, but it depends on some features that are only found in 3.0.4 (some HttpWagon or something or other).

Removing the 3.0.3 maven and installing homebrew's maven 3.0.4 fixes this problem. If there's an easy way to make the ABCL maven-embedder stuff work with 3.0.3 or 3.0.4, that would be nice.

Other Remote Repositories

I'm still relying on the freehep 2d graphics libraries and these aren't in maven central, but rather in the freehep maven repo. How can we tell the ABCL maven stuff to search this repository? There may be a way, but if so I haven't found it yet.

Using Sharpsign-quote

It turns out one can do (#"foo" ...) instead of (java:jcall "foo" ...), so I've switched over my code to this style.

ABCL-CDK

It's a farily trivial package at this point, but I've released abcl-cdk which provides some examples of calling the CDK from ABCL.

in which I attempt to write some Common Lisp code to be run in a Common Lisp environment that runs inside a virtual machine designed to support a C-like language that incorporated a few lispy features, so that I can use a library written in said C-like language with my Common Lisp code, or something like that.

Ok, it's time to see if I can get the CDK and ABCL playing nicely together.

CDK

The CDK (Chemistry Development Kit) is java library for dealing with various type of chemistry data, elements, atoms, bonds, molecules, etc... and various computed or measured properties thereof. I should point out that the CDK isn't really just one library, but rather a family of various related libraries. We'll come back to building an appropirate version of CDK in a moment, but, for now, let's move on.

ABCL

ABCL is an implementation of the Common Lisp programming language that runs on the JVM. Besides running (in theory) on any platform that supports the JVM, ABCL provides for relatively smooth interoperability with other code (such as Java libraries) that run on the JVM.

Building CDK

First, we need the CDK. Some of the main things I want to do with the CDK are to instantiate a molecule from a SMILES string, get a 2D representation of the molecule, and compute various properties (molecular weight, charge, etc...) of the molecule. The only problem with that is the main CDK doesn't actually support 2D rendering. Before we get into how to get a CDK that does 2D rendering, I should take this opportunity to gripe about the various versions of the CDK for a moment.

Sourceforge's CDKs

One of the things that bothers me about sourceforge-hosted projects is that there are often too many "home pages" for a project. For the CDK we have two:

JChemPaint

Of course none of these (at least on first glance) contain the 2D rendering code we want. It turns out that's not part of the core CDK, but rather part of the JChemPaint code. The JChemPaint project is another effort, closely related to CDK, that has applets/applications for interactive 2D molecule editing, 2D structure rendering code, etc... So, on the JChemPaint page we see links to various downloads where we have CDK, JChemPaint, CDK-JChemPaint, etc...

Wait, what? CDK-JChemPaint? Hang on a second! We'll come back to that in a moment. First we see that the CDK code is moving ahead rapidly but that the JChemPaint is from September 2011 and the JChemPaint (development) code is from November 2010! Hmm...

So, near as I can tell, JChemPaint was a separate, but related-to-CDK project and at some point somebody cribbed some of the reusable bits from JChemPaint and put them into CDK-JChemPaint.

But then it seems like maintaining a separate CDK-JChemPaint seemed a bit silly and egonw (?) has been maintaining a branch of the CDK with some of the JChemPaint (or is it CDK-JChemPaint?) functionality incorporated: https://github.com/egonw/cdk/tree/13-unsorted-patches. This is what I orginally used for the 2D rendering code. It turns out that there is a newer, better (?) version of the CDK with the appropriate JChemPaint bits added, the 381-14x-renderextra branch.

Back to building CDK...

First we get the code

git clone git://github.com/cdk/cdk.git

Then we need to pull from egonw's branch (I suppose we could have just cloned this first):

git remote add egonw git://github.com/egonw/cdk.git
git pull egonw

And now let's checkout the branch we want:

git checkout 381-14x-renderextra

Ok, now we've got the code. We build it with ant:

ant

Assuming we have java properly setup, things should build fine. Now we have a brazillion jar files in cdk/dist/jar. Wait, that's not what we want. We want a single CDK jar that we can (presumaly) point our CLASSPATH to, or at least do whatever the ABCL equivalent is. Turns out there's a "dist-large" target in the CDK build.xml file so we can build that with:

ant dist-large

Ok, now we have dist/jar/cdk-1.4.8.git.jar.

Installing CDK

So what are we supposed to do with that? Well, it appears that some folks in the Java world use this thing called maven for both remote and local package fetching/deployment/whatever-you-call-it-in-the-java-world.

So, assuming we have maven around, we can install a CDK which we can later, hopefully, use with ABCL with the following:

Notice that we need two distinct version identifiers as maven wants nice clean version numbers (and doesn't really like the 1.4.8.git version) and most maven-ized projects seem to use the SNAPSHOT suffix for in-progress releases. On the other hand, the CDK build.props file sets the version to 1.4.8.git. We use the two identifiers here so that cdk-1.4.8.git.jar gits installed as org.openscience.cdk/cdk version 1.4.8-SNAPSHOT.

(Note: I think there's some built-in functionality in ABCL to handle this next task -- but I couldn't get it to work!)

Fortunately, the clojure folks, who occasionally drink a little too much Java toolchain (tooling?) Kool-aid for my taste, but at least have enough taste to want a lisp-ish language, have gotten here first and the standard tool for these kinds of jobs seems to be Phil Hagelberg's leiningen. I'm going to assume for the moment that you actually have leningen lying around, or that you're smart enough to figure out some other way to get these dependencies installed if not.

So, to trick leiningen into doing some dirty work for us, we make a project.clj that looks as follows:

which will install the dependencies for us somewhere in ~/.m2 (let's forget about system-wide installs for the moment).

Using Java Dependency Libraries

Ok, we should be ready to figure out how to make ABCL talk to CDK now. First we just have to figure out how to make ABCL talk to CDK. Wait, wasn't that what I just said? Yes, but, how do we do it? Fortunately, the ABCL guys anticpated this problem and added what they call abcl-asdf. By doing a (require 'abcl-asdf) (oh wait, and a (require 'abcl-contrib) before that, I think), we can tell our ASDF system how to tell ABCL to tell the JVM where to find the jars we need put on the CLASSPATH, or something like that.

We can add :mvn components to our ASDF system and the abcl-asdf machinery will add the maven artifact (?) or jar file or whatever to the CLASSPATH, or at least somehow make it so the classes are available to the JVM.

Well, that's the theory anyway. In practice this doesn't work with a stock ABCL because of the following bug: http://trac.common-lisp.net/armedbear/ticket/204. Once this is fixed (via the patch attached to the bug report), and ABCL rebuilt, a simple:

(asdf:load-system 'abcl-cdk-hacking)

will load the dependencies into the JVM and we should be off and running, finally.

Calling Static Java Methods

Ok, now we need to do some Java interop stuff with CDK. First thing we want to do is call a static Java method.

We're going to need an instance of the org.openscience.cdk.DefaultChemObjectBuilder class. We can get this via the static getInstance method as follows:

Calling Methods on Java Objects

Java Class Identifiers

Well, the clojure folks have figured out that some people, at least, hate typing long java class names all over the place, and the ABCL java interop stuff seems to require lots of typing of long java names. In a perhaps misguided attempt to relieve this burden and provide something more like clojure's syntax, I present the jimport macro:

which then defines the value of the |DefaultChemObjectBuilder| symbol (in the current packaage, at least if not specified in the jimport call) to be "org.openscience.cdk.DefaultChemObjectBuilder", so now we can do:

Not a huge win, but it does allow the compiler to ensure that we're seeing identified symbols, rather than just potentially random strings for Java classes.

Java List<Foo>'s

One of the CDK classes, org.openscience.cdk.renderer.AtomContainerRenderer, has a constructor that expects a List<IGenerator<IAtomContainer>> as one of its arguments. How do we invoke the constructor with one of those? Well, it turns out we can't just use a lisp list as the argument. We have to make a java List of some sort. It turns out there's some infrastructure provided by ABCL to help with this, although nothing I can find that does exactly what I need. The extensible-sequence stuff allows us to make a lisp sequence that is actually some sort of instance of the java.util.List interface. I use a java.util.Vector and provide a helper function called jlist as follows:

but then use the corresponding streams where we need java streams. In particular the freehep SVG and PDF libraries want java streams for files. It turns out there's a function to get the java output stream associated with a lisp stream, getWrappedOutputStream. We use that to get the java.io.Stream or whatever and we're good to go.

Ever since taking Stuart Russell's Knowledge Representation and Reasoning class (Holy smokes, that was my first real introduction to Common Lisp and that was eleven years ago! How time flies...), I've always felt like I don't really have my data unless I have it in some machine readable form -- and not just in an unstructured text document or in some proprietary GUI application, but rather in a place where I can reasonably query, update, search, etc... the data.

So, one kind of data that I haven't really had is my address book. For a while I kept things locally in Apple's Address Book application, but that really only worked for a single machine, at least at first. At some point I discovered the awesomeness of DAViCal, which is a CALDAV/CARDDAV server for serving up calendar and contact/address book information. Great! OK, so, I jumped through the requisite hoops to get things working between DAViCal and Address Book (and iCal) and all was good -- except for the fact that I still didn't have a nice convenient API for searching/querying/retrieving/updating the data in DAViCal.

So, that motivated me to write some code for working with my data from Address Book and from DAViCal, which leads me to this blog post. It turns out that there is an IETF standard for exchanging contact information, the VCARD format. Apparently I can export and import data from Address Book in this form, and this is how the CardDAV data is stored in DAViCal, I think. Great. Now all I need to do is to parse the VCARD data and I'm good to go. Fortunately, the spec lays out the file format nicely and this shouldn't be too terribly hard to parse.

Data Model?

But, that leads to the next question of, having parsed the data, what I'm I going to do with the data? Or, put another way, how am I going to represent/model the data contained in the VCARD file? Just slurping the VCARD bits into a buffer of characters doesn't help me find, for instance, the email address of a particular person. One approach would be to define a data model with CLOS classes and generic functions that operate on those classes to allow for reading/writing/querying the data. Another approach would be to use a more generic data structure like lists or, probably better yet, nested hash-tables that would allow traversal of a graph of data objects via key/value relationships. The VCARD format itself could be thought of as a (somewhat unwieldy) data model.

The VCARD Specification

The VCARD 3.0 specification formally describes what VCARD data should look like. The problem is that transferring back and forth either between CLOS objects or hash-tables and VCARD data sounds like a cumbersome and error-prone process, especially as either the objects or hash-tables, on the one hand, or the VCARD format itself evolves over time. Surely there's a better approach.

The other problem with the VCARD specification is that the specification isn't machine readable. Yes, there are bits of ABNF grammars in there, but the spec, as a whole, isn't easily parsed and queried. If one was going to rely on a specification to define the data model, it sure would be nice if the specification itself could be read, interrogated, and used by our programs.

xCard: VCARD XML Representation

Enter the xCard specification. The xCard specification does two things, it provides a model for reading and writing vCard data (that is the data that can be represented in VCARDs, not the actual VCARD syntax) and, more importantly, it provides a machine-readable representation of the specification in the form of a RELAX NG schema.

RELAX NG

RELAX NG itself is an XML schema for describing XML schemas. It provides for a human-friendly but machine-readable format for specifying RELAX NG schemas, the RELAX NG Compact Syntax.

Back to xCard

So, we find the compact syntax for the xCard schema at the end of the standard. For whatever reason, I can't locate a canonical version of the schema outside of the spec, but it's simple enough to cut and paste the bits out of the spec and place it in an RNG file. Great. Now we've got data in VCARD format, a nice machine-readable (and human-friendly) description of the data model in the RELAX NG compact syntax. How does this solve the problem of representing/querying/modifying the data that I mentioned above? Well, one approach would be automagically generate CLOS classes and interfaces that match the RELAX NG schema. This seemed like an interesting approach, but an awful lot of work. Since we've got a schema that defines the vCard semantics, as represented by an XML document, perhaps we can just use an in-memory representation of the XML data itself as our "data model" for reading/writing/querying/etc... the address book data. This is the approach I've taken with cl-vcard, and we'll come back to it momentarily.

Parsing VCARDs parser-combinators

For the moment, before we get into what are we transforming the data to, we need to consider what we're transforming the data from and how to do so. A simple, hand-coded recursive descent parser would probably be the most straightforward way to go, but I'm exceedingly lazy and wanted someone else to do the bulk of the heavy lifting of parsing for me. Enter Jakub Higersberger's awesome parser-combinator library, inspired by Haskell's parsec monadic parser-combinator library.

While it may be overkill, parser-combinator provides for a nice, clean API for writing parsers. The core of the parsing routine is shown below:

Parser combinators are designed to work with functional data structures, as it they may backtrack, and modifying data structures as one goes with parser combinators can lead to problems. Therefore, I use fset, a functional collections library, for building the parsed representation of the data as I go. I could probably get away without doing this, but, it seems like a good idea to use a functional parser as such, rather than abusing the idea that we're not taking advantage of backtracking such that we can modify our data structures along the way as we do the parse.

So we have a big, hairy document tree that (hopefully) has the vcard data we want in it in a form that will, eventually, prove to be easy for us to work with. But, before we get into actually doing anything with the data, let's take a digression into XML data representation and validation.

STP, CXML, RNG, CXML-RNG

Notice that we're using stp:serialize. CXML-STP is a document-based interface to XML data, written by David Lichteblau, and somewhat like the DOM, only with a much nicer interface (in my subject opinion, of course), inspired by the XOM. A comparison between the DOM and STP interfaces can be found here.

In the guts of the parser we use functions like stp:make-element and stp:append-child to construct the document tree. These, and the rest of the STP API, sit on top of Gilbert Baumann's Closure XML (or CXML) library.

What's the damn phone number?

OK, so far this has all been a lot of work and we haven't even gotten to access any of our data. We're almost there. We just need one more XML library (of course...). We could work with the STP document directly:

Plexippus XPath is another excellent library in the CXML family, written by Ivan Shvedunov. Plexippus provides an implementation of the XPath spec that works with CXML documents. Using plexippus instead of walking the tree by hand as above, we can do:

Ignoring the with-namespaces macro invocation for a moment, we see a single line of code that gets us the information we want. Win! This is a very simple example, but the XPath language allows us to write much more interesting queries. There are two housekeeping matters we need to take care of first.

First, let's make a macro to handle the namespace stuff. Plexippus is (rightfully) rather picky about making sure that XML element and attribute names are properly qualified. We can set the default namespace as above, but we'll do this in a macro in case we want to change this later:

Second, we'll make a little function for converting XPath query results to text. We'll probably use something else in an real application using this stuff, but it's nice for playing around with the API and for interactive development.

There's lots more one can do, but that should give you a flavor of how XPath can be used to effectively walk the document tree.

Validation

As mentioned earlier, the xCard specification gives us a nice Relax NG (RNG) schema for data about individuals and other entities (as the spec somewhat vacuously says). The nice thing about this is that we can use the schema to validate our in memory representation of the xCard data -- even if there's never an xCard file per se:

(stp:serialize *baba* (cxml-rng:make-validator *vcard-rng-schema*))

Once we've got that the document in place we can validate it against the Relax NG schema. The VCARD -> xCard transformation may not be complete (which it isn't yet), but at least we know that the (so far tested) output is valid XML, that complies with the Relax NG schema.

Really putting it all together

Here's another simple example of getting some data out of the xCard document:

CL-VCARD-EXAMPLE> (xpath:with-namespaces ((nil cl-vcard::*vcard-namespace*))
(format nil "~A is a ~A who works at ~A and can be reached via e-mail at ~A"
(xpath:evaluate "string(/vcards/vcard/fn/*/text())" *baba*)
(xpath:evaluate "string(/vcards/vcard/title/*/text())" *baba*)
(xpath:evaluate "string(/vcards/vcard/org/*/text())" *baba*)
(xpath:evaluate "string(/vcards/vcard/email/*/text())" *baba*)))
"Baba O'Riley is a Field Worker who works at Polydor Records and can be reached via e-mail at thewho@example.com"

Of course we can write more interesting queries, make a proper front end to the data, write it back out, talk to an address book server, etc... but those exercises are left for the reader.

Wait a minute, did I say talk to a server? DRAKMA would be perfect for that, but there's one problem. In the next blog post, I'll go into how one can talk to a CardDAV server and what one needs to change in DRAKMA to make this work. Next time...

Sun, 27 Mar 2011 19:13:46 GMT

Well, the previous attempts at the pixel setf-expander got most of the way there, but there are a couple of important changes since the last blog post, that I figured I should document for posterity's sake, lest someone run across the old post and attempt to base some future setf-expander off of the almost-but-not-quite-fully-working version contained therein.

First of all, Utz Uwe-Haus provided a number of fixes to get the fast path setf-expander working on Allegro. The first step was to get %get-image-dimensions working via a cltl2-signature-compatible version of variable-information. The second step was to look for types of the form (integer 0 255) instead of (unsigned-byte 8), which is how Allegro apparently reports (unsigned-byte 8)'s. Finally, it turns out that Allegro is finicky about needing things at compile-time in slightly different ways than SBCL is and it needs +max-image-channels+ define at compile-time, which sounds like the right thing to do in any case.

Ok, enough for the Allegro fixes. Now into the pixel setf-expander itself. There were a couple problems here. First, we weren't expanding image-var itself. This meant things would break if we tried to do:

It turns out that we need to expand image-var itself with get-setf-expansion and deal with the 5 return values as appropriate. I think, that I can ignore the storing form, since I'm not actually, changing the value referred to by image-var and that I can just use the accessing form in the (setf (aref ...)) calls in the expander. If any language lawyers have any input here, it would be appreciated. Also, it's important to keep in mind that we need to return the temporary variables and their value forms from the get-setf-expansion. Ugh... This is all kind of a mess, but the end product is pretty neat! A non-consing idiomatic way to set pixel values, assuming we've declared the type of the image, but at least we can do so using the languages own (declare ...) mechanism rather than resorting to some sort of (with-fast-pixels ...) macro around all of the pixel/setf pixel calls.

Finally, if you've gotten this far and you want to see opticl in action, check out spectacle a CLIM application for viewing images that uses opticl for the image loading, representation, etc... On SBCL, and presumably Allegro, it has nice responsive scrolling/zooming/rotating/etc..., but if the pixel stuff conses (as it seems to do on CCL), it can be a bit sluggish.

from a suitable lisp with quicklisp installed. Opticl has been mostly developed on SBCL, but should work on any Common Lisp, and has seen some limited testing on CCL and ABCL. Patches to more fully support other lisps would be most welcome, should they be needed.

Opticl picks up many of the ideas and concepts from my earlier ch-image image processing library and Matthieu Villenueve's IMAGO library, but offers some advantages over both packages, such as the direct use of common lisp arrays for images and the efficient access to both getting and setting pixel values using mulitple-values, a setf-exapnder and, where available, CLtL2-style variable information to provide hints to the compiler to generate efficient code using standard lisp type declaration expressions.

Some of the core features of opticl are:

representation of various types of 2-d images in common lisp arrays and routines for making the appropriate arrays

routines for efficiently performing affine transformations of images providing for operations such as resizing, scaling, rotating and skewing images

support for discrete convolution with arbitrary kernels, with built-in kernels for blurring and sharpening images

support for morphological operations with arbitrary kernels, with built-in kernels for dilating and eroding images

simple drawing primitives

performing gamma computations on images

I/O routines to read and write from various file formats; currently supported filetypes are JPEG, PNG, TIFF, PBM, PGM, PPM and GIF.

routines for converting between various image types

k-means clustering of pixels in images

More details about opticl can be found in the README, and in the opticl-test and opticl-examples packages. Note that these packages have been broken out into their own repositories in order to keep the size of a core opticl installation to a minimum. Currently opticl checks in around 3,500 lines of lisp code and the code compiles to approximately 900k of fasl files on SBCL/x86-64.

Fri, 11 Mar 2011 22:19:38 GMT

Just wanted to announce that retrospectiff now supports both reading and writing TIFF files, using gigamonkey's binary-data library. Unlike the old retrospectiff, both big-endian and little-endian formats are properly supported for both reading and writing.

opticl uses retrospectiff for reading and writing TIFF files and now supports writing images as TIFF files (along with with PNG, JPEG and PNM).

Efficient Access to Pixel Information in Images

We want a way to efficiently (using few processor cycles and minimally consing) access information about individual pixes in images. Multiple values allow for a non-consing way to get and set more than one value at a time using the lisp implementation's argument passing and value returning facilities without having to explicitly place values in or retrieve values from a list.

This handles both single-channel (grayscale) and multi-channel (RGB and RGBA) pixels, returning the number of values as appropriate.

Setting pixels, on the other hand, is a bit tricker. We want a form that allows us to (setf (pixel img y x) ...) and take the number of values as appropriate for the particular image, but we also want this setting to be non-consing and efficient. CL has a define-setf-expander that can be used for just such a thing. Turns out it's fairly tricky to get this right, so I have included my intermediate attempts, followed by the final version.

An improved setf-expander

It would be nice if we could use standard CL declaration forms to yield this information. It turns out that CLtL2 has a facility that we can use to do this, the variable-information facility. Using this we can use the following function to grab information about the declared type of an image (if present):

Questions:

Should grayscale images have be 3-dimensional arrays with a 3-rd dimension of 1 instead of 2-d images? It would simplify some code in that we would know that there would always be three indices for arrays -- I think we can get away with variable rank.

Should we use the with-image macro for establishing compile-time information about arrays -- I think cltl2:variable-information is a better way to go, but we use the ugly fallback mechanism on non-(SBCL or CCL) platforms. What about ABCL, CMUCL, clisp, ECL and Allegro? At least some of these should support cltl2.

Fri, 25 Feb 2011 04:53:08 GMT

Well, I was hoping that moving my blog to a more mainstream OS (linux) might address some of the periodic crashing issues. Alas, that doesn't seem to be the case. Perhaps upgrading to a new version of hunchentoot will motivate me to spend some more time trying to track down the source(s) of the problem.

After a ridiculously long hiatus, nuclblog is finally getting a little love. It now uses cl-markdown, and has hooks for extending posts such that things like disqus comments can be added without hacking up the core nuclblog code.

If anyone wants to see the source on github, let me know.

Oh, and if you want to see the disqus comments in action, click on the link for a blog entry to bring it up on its own page.

A couple of weeks ago, I wanted the ability to parse MP4 media files, and I couldn't find a parser in Common Lisp, so I wrote iso-media. iso-media, like the new version of retrospectiff, uses Peter Seibel's binary-data library. iso-media can be used to read and write MP4 audio and video files.

For example, the following code snippet loads an MP4 song into the special variable *cc*:

Tue, 15 Feb 2011 05:44:37 GMT

The upside of this is that there is now much better support for reading both big-endian and little-endian TIFF files. The downside is that writing TIFF files is not yet supported in the new scheme of things. Hopefully this will change in the next few weeks.

In the meantime, if you notice any problems with reading TIFF images, please let me know.

I made a lot of bad decisions early in my lisp coding days. Lately I've been trying to undo some of the most egregious mistakes. One of the big mistakes was that I had way too many dependencies between my various libraries. Everything I wrote depended on ch-util and ch-asdf. There was no need for this.

I've released new versions of clem and ch-image that don't depend on ch-util and ch-asdf. There may still be some references in the doc building stuff, but the core libraries and their associated tests no longer depend on these.

To mark the milestone of finally bringing hunchentoot-auth, hunchentoot-vhost, hunchenoot-cgi and nuclblog into the present such that they work properly with the hunchentoot-1.0 release, I've rolled up the following packages:

Of course these are probably best gotten from the git repo's, but for those of you who like released versions, I figured I'd roll new ones since it had been quite some time (almost two years in some cases!).

Well, after months of instability with my hunchentoot-based webserver, I finally, once again, got around to trying to figure out the source of the instability was. I had come to blame SBCL's sb-ext:run-program functionality as I was able to fairly reliably crash the server using apachebench. I was also seeing sporadic crashes somewhat randomly after the server being up for a week or so. So, this was a pretty strong hit that it had something to do with sb-ext:run-program. Folks who were much more knowledgeable than I about the SBCL internals, including Francois-Rene Rideau and Gabor Melis, looked at cleaning up possible sources of race conditions and generally robustifying sb-ext:run-program but none of the fixes seemed to make the situation better. Compounding my difficulties was the fact that I was running the server on FreeBSD, which doesn't see quite the level of SBCL testing/hacking that, say, linux does, so I thought it possible that there may be a bug either in the way SBCL handles signals on FreeBSD or in FreeBSD itself. Finally, I got around to replicating, roughly, my setup on another computer. In this case a MacOS box which, when subjected to the same stressful conditions, gave me a helpful error message that said something about being unable to open a pipe or perhaps that there were too many open pipes. This got me thinking "wait a minute, I'm just calling the program via sb-ext:run-program and getting a stream to read data back from the program; who's closing the stream and getting rid of the process?" Then it dawned on me that perhaps nobody was and perhaps these processes were sticking around, consuming scarce resources, like pipes, and, eventually, causing the server to crash. Sure enough, waiting for the process to finish and then closing the process cleared up my problem.

I should point out that SBCL's sb-ext:run-program has an argument that seems relevant here, which is the :wait arugment. One can specify :wait t which will wait until the process has finished. This seemed to work in some cases, but fail in others. Eventually, it occurred to me that it was failing in the cases where the output was larger than in the cases where it was succeeding. I think what was going on was that the external program was writing data to the stream which would fill up some buffer, which then blocked waiting for data to be read, which wasn't going to happen until after the process returned. There could be something else, going on here, but it seems to me that :wait t, while somewhat in spirit what I want, isn't going to do it from my. In this case, I'm just launching a process and expecting to get some data back from it, this isn't, say, a window manager that's going to live on for the life of the SBCL process, or beyond. But, :wait t didn't seem to do what I need either, so I was back to :wait nil. Now that I figured out I needed to close the process I came up with:

(defmacro with-input-from-program ((stream program program-args environment)
&body body)
"Creates an new process of the specified by PROGRAM using
PROGRAM-ARGS as a list of the arguments to the program. Binds the
stream variable to an input stream from which the output of the
process can be read and executes body as an implicit progn."
#+sbcl
(let ((process (gensym)))
`(let ((,process (sb-ext::run-program ,program
,program-args
:output :stream
:environment ,environment
:wait nil)))
(when ,process
(unwind-protect
(let ((,stream (sb-ext:process-output ,process)))
,@body)
(sb-ext:process-wait ,process)
(sb-ext:process-close ,process)))))
#-sbcl
`(error "Not implemented yet!"))

which I can use a la with-input-from-string to read the data from the external process:

Some of you may have noticed by now that this site is quite often down. I just wanted to give a brief explanation for this. This site certainly could be more robust, but, at least for the moment, I'm running a development instance of hunchentoot on one of the generally less-well supported SBCL platforms, FreeBSD, and am using things like my hunchentoot-cgi interface and nuclblog which certainly haven't been well tested, and probably aren't used by anyone else either. All this adds up to a lot of moving parts that occasionally fail in less-than-graceful ways.

In fact, just today the site crashed because SBCL ended up in LDB. I try to avoid this and try to reset the thing soon after this happens, but occasionally a good chunk of time passes. Until recently, I was seeing very frequent failures in which SBCL would run out of processes and the server would die. It turns out this was caused by hunchentoot-cgi and the way it was calling sb-ext:run-process. By calling sb-ext:run-process with :wait nil, the problem seemed to go away and all was right with the world for a couple of weeks. I'm not sure what's up with the latest failure and will keep on eye things to hopefully minimized the downtime if the server ends up back in LDB again.

Why am I using hunchentoot-cgi, you ask? Well, that seemed like the easiest way to use gitweb on my current setup. Hopefully it won't cause too many other problems. Thanks for your patience to this of you who actually might want to access the site.

(some years pass...) So, thanks to some gentle prodding and patches from Daniel Herring, I've finally gotten around to making clem and ch-image run on lisps besides SBCL. In particular, they both work on ccl and clisp, with ecl support hopefully not too far behind.

A combination of bogus things on my part, the occasional bug in the lisp implementation itself, and idiosyncrasies of the spec meant that various things didn't work right. The most-common offender was that I was assuming that things like double-float and fixnum defined classes, and that one could use these as method specializers. Apparently, the lisp spec says otherwise.

Also, there were a few bugs in ccl, especially in the 32-bit x86 version, that the ccl team fixed straightaway that allow trunk builds of ccl to load and run clem and ch-image. Thanks to R. Matthew Emerson for the prompt fixes and to Gary Byers for the helpful discussion about whether or not (setf (find-class 'foo) (find-class 'bar)) should create a new type for foo.

I should point out that these changes haven't found there way into official releases, per se, so if you want to use this stuff on ccl, clisp, ecl, etc... make sure you get the latest source from the git repo.

Fri, 31 Oct 2008 04:45:36 GMT

The TIFF image file format has been around a long time, and lisp even longer. Yet, I couldn't find any common lisp libraries for reading TIFF images. Perhaps there's one out there I missed, but the only one I could find was my previous attempt, tiff-ffi, which consists of some wrapper functions around FFI calls to libtiff. I wanted a native common lisp TIFF library that wouldn't require the libtiff library so, at Robert Strandh's urging, I wrote retrospectiff.

Currently, there is no support for writing TIFF files, and only a fraction of the TIFF image formats are supported, but RGB and ARGB images, both uncompressed and with LZW compression, can be read. Grayscale support should come next and, hopefully, support for writing TIFF files before too long.

Ok, after a couple of years... I've finally gotten around to incorporating (some of) my changes to the cl-jpeg library into the upstream sources. There's a new version (1.023) up on the common-lisp.net cl-jpeg site .

To go along with this, the latest git sources of ch-image now use cl-jpeg.asd instead of my hacked up jpeg.asd. My jpeg project will now disappear and should be replaced with the upstream sources.

Sat, 15 Mar 2008 21:57:00 GMT

I know, I know... Why on earth would you want to run CGIs from hunchentoot? Well, while it may seem counterintuitive, I've found a need for this as I want to run the gitweb cgi interface behind hunchentoot and don't feel like setting up apache as the front-end to an otherwise happy hunchentoot site (this one), so I rigged up a little CGI interface for hunchentoot .

To use, check out the hunchentoot-cgi::create-cgi-dispatcher-and-handler function.

Wed, 13 Feb 2008 09:13:59 GMT

Ok, so I've finally gotten around to moving (at least some of) my projects over to git. The good news, besides having the code in a modern version control system, is that the repos are now publicly accessible. The list of projects can be found at:

Ok, following closely on the heels of the last release, here are new releases of nuclblog, hunchentoot-vhost and hunchentoot-auth.

New features include:

nuclblog now uses esc instead of str for textareas for editing blog entries so that, e.g., > doesn't get converted into > (thanks to Timothy Ritchey for the patch).

the hunchentoot-vhost dispatch mechanism was simplified to remove the &optional vhost argument from dispatch functions, which means that standard hunchentoot dispatch functions can be used, allowing for the removal of create-virtual-host-folder-dispatcher-and-handler. Added the virtual-host special variable to replace the &optional vhost arg.

Added some preliminary test code to hunchentoot-auth and exported some more symbols

Tue, 02 Oct 2007 05:07:07 GMT

Ok, a number of bugs and design flaws have been fixed. One can now be logged into multiple blogs on the same server with different user names and can log in and out of one without effecting the status of the other blogs. Also, some internal API cleanup for the blog handler functions. Oh, also the realm stuff in hunchentoot-auth was simplified and nuclblog now does a better job of keeping track of the information regarding which ports to use.

Tue, 11 Sep 2007 23:28:35 GMT

After I released 0.4.5, it was pointed out to me that there is a new cl-who that uses downcase-tokens-p instead of downcase-tags-p. nuclblog 0.4.6 has been changed to address this and, as a consequence, requires cl-who 0.11.0 or later.

Also, as a bonus, there's a new release of ch-util (0.3.3) that cleans up the build system a bit, removing some hacks I thought were neat when I was first learning ASDF and restoring buildability on (at least some) non-SBCL lisps. Thanks to Michele Pasin for the bug reports.

last time it was hunchentoot-vhost (of which there is a new release as well), this time it's hunchentoot-auth. Along the way, there's a new nuclblog release that supports the latest and greatest virtual hosting and user authentication facilities in hunchentoot-vhost and hunchentoot-auth, respectively.

I've updated hunchentoot-vhost to include Edi's copyright, as a couple of the functions are blatant cut-and-paste-and-edit versions of hunchentoot functions. But, more importantly, I've also updated nuclblog to work with hunchentoot-vhost so that now you can 1) have multiple blogs per host, but also have multiple hosts going, each potentially with multiple blogs.

Thu, 10 May 2007 17:25:11 GMT

Thanks to Timothy Ritchey for pointing out that I had broken the default startup on port 4242 (for http) and 4243 (for https). Now this works out of this box. Also, thanks to Brian Mastenbrook for pointing out a couple places where I needed some locks.

Thu, 10 May 2007 07:40:52 GMT

I've finally gotten around to packaging up a release of smarkup with cl-typesetting support. Unfortunately, there is no documentation and there are no examples for this yet. Hopefully those will appear in a future release release.

Oh, I should point out that an addition to ch-asdf, nuclblog also requires a number of other packages to run, including hunchentoot, cl-who, puri, cl-store and probably a bunch of other packages I'm forgetting at the moment.

Tue, 24 Apr 2007 07:37:06 GMT

yay. nuclblog now supports adding new blog entries and editing old ones. Hunchentoot is a joy to work with. I think all of the features of cl-blog that I used are in nuclblog now. There's no trackback, which I had disabled because of all the spam anyway. Also, it would be really nice to have some sort of comments facility, but that will have to wait for another day.

If anyone's interested in checking out nuclblog, let me know and I'll post a release of the code.

Sun, 22 Apr 2007 02:34:14 GMT

Ok, I've finally gotten around to, at least, starting to port cl-blog to hunchentoot. Right now it's pretty bare bones, but it has all my old blog posts and should see more features in the coming days. Let me know if things don't work as expected.

Moving on with the new releases... CLEM 0.4 has a whole slew of changes. The biggest change is largely under-the-covers and allows for n-dimensional matrices. Most stuff still expects 2-d matrices, but doing things like adding two 4-d matrices should work (multiplying them not so much, of course).

There are also a number of miscellaneous API cleanup issues. In particular, the mat-scale!/mat-add!/etc... variants have gone away, being replaced by (mat-add m n :in-place t). I'm not 100% the :in-place keyword is the way to go, but I got tired of maintaining an the ! version of every matrix operation, so I'm giving this a go. Feedback, as always, appreciated.

[well, that was fast... fixed a bug in the tiff-ffi-gen depnedencies. now we're at version 0.2.2]

Some folks have complained about needing to install gcc-xml in order to get the libtiff FFI bindings working. Well, I've finally gotten around to separating out the generation of the xml file and sb-alien file from the C header files from the loading of a pre-existing sb-alien file. Now you can just download tiff-ffi and asdf:oos 'asdf:load-op it directly, without needing gcc-xml-ffi. If you want to rerun the gcc-xml-ffi auto-generation stuff, you can just do:

In the spirit of trying new things, I've decided to try using vox to blog TED. It's my first time at TED and it's an interesting event so far. While there is at least one other computational biologist here, I get the feeling I'm the only lisp hacker in the audience, BICBW.

After some more help from Juho Snellman in tracking down some nasty bugs, including one in the debugging code down in print.c, I was able to get sbcl/x86-64/darwin up and running without the sb-ext:evaluator-mode hacks. This experimental version has been checked into the SBCL tree as version 1.0.3.16 and most of the tests pass. There are still test failures in float.pure.lisp, debug.impure.lisp, foreign-stack-alignment.impure.lisp and run-program.impure.lisp.

Test reports on x86-64/darwin (and other platforms to make sure I didn't break anything) are most welcome.

Well, after many months I finally decided to dust off the x86-64/macos SBCL port. After a couple days and some invaluable telepathic debugging help from Juho Snellman, I was finally able to get through make-target-2 and get a full core up and running. But, of course, there are still some problems. First, in order to compile make-target-2.lisp, I had to set sb-ext:evaluator-mode to :interpret. Second, there are still some rather major bugs that cause the system to drop into LDB far too often. But, at least there's some progress. I'll post a patch in the next few days and, with any luck, get these bugs fixed and the code into the tree before too long.

Well, it's been far too long since I've updated this page to reflect the status of threads and what not on SBCL/MacOS. In the meantime, the lutex stuff has landed on the trunk and the threads stuff has been cleaned up a bit, but is still somewhat unstable and doesn't pass the threads tests without catching an illegal instruction error. To address this, I, along with some major help from Alastair Bridgewater, have added an experimental feature for SBCL to use mach exception handling instead of (just) BSD-style signals.

The good news, irrespective of threads, is that this fixes the long-standing "CrashReporter" problem that many have complained about and it makes it so that one can use GDB with SBCL. Previously, GDB choked on SBCL's strategy of using mprotect for protecting memory in non-exceptional cases by preventing GDB from stepping across the EXCBADACCESS (the mach exception equivalent of a SIGBUS or SIGSEGV) and mach using mach exception handlers gets around this. Anyway, this has been checked in to the SBCL trunk but, at least for the moment, should be considered experimental and must be enabled by building with the :mach-exception-handler feature. Oh, and there's no PPC port for this yet.

As for threads, they are still not quite there, but certainly seem better and I have been using them for development work for some time. Hopefully the added debuggability will help in tracking down the remaining issues.

Well, it's getting closer. The lutex branch no longer kernel panics, thanks to mutex locks around the i386setldt calls, and the garbage collection-caused memory corruption seems to be fixed. So it builds, builds itself, and all tests pass, usually. Unfortunately, it's the "usually" that is the problem. About 10% of the time, the threads test hangs with a thread waiting for a mutex lock that it's never going to get. I'm not sure if the problem is a subtle race condition in the code or if there are problems with MacOS' pthreads mutex/condition variable implementation, but it happens often enough that there is definitely some sort of problem somewhere. Hopefully this will get merged onto the head before too long, after the 0.9.13 release. Oh, and slam.sh still doesn't work.

A while back I asked something to the effect of "how do you frob the EIP in a mach exception handler?" Well, the answer is that you get the MACHINETHREADSTATE (i386THREADSTATE in this case) via threadgetstate, adjust the eip in the threadstatet, and then call threadsetstate to get the changes to take effect. I wasn't calling threadsetstate before and assumed that this would behave like a Unix signal handler where you can just adjust the machine context and everything happens automatically. In the case of mach exception handlers, you need to explicitly set the state.

SBCL makes extensive use of POSIX signals for such tasks as garbage collection, error handling, and ensuring that atomic operations are executed atomically. MacOS X supports POSIX-style signals, but there are some problems with Apple's BSD-style, as they call it, signalling mechanism. The main problem, viz SBCL, is that MacOS X's signal implementation and GDB interact in such a way that renders GDB basically useless for debugging SBCL. SBCL's strategy of protecting memory pages with mprotect and then using a signal (usually SIGBUS or SIGSEGV, and SIGBUS in the case of MacOS X) handler to either adjust the memory protection mode and take apporpriate action or to signal an error causes MacOS to issue a mach exception (EXCBADACCESS) which is then caught be the kernel and causes a SIGBUS to be sent to the offending process. Unfortunately, GDB can't be used to continue past the offending mach exception, so the process just continues to send the exception and never issues the signal to the listening process or moves the program counter past the offending instruction.

In addition to the SIGBUS debugging problems, the signalling mechanism of MacOS X on Intel poses other problems in that MacOS' delivery of SIGTRAP signals is unreliable. It generally works, but only about 95% of the time. This is unacceptable for SBCL's use as a mechanism for signalling when operations that are supposed to be atomic have been interrupted and that approriate action needs to be taken. We have worked around the SIGTRAP problems by using the UDA2 instruction to cause a SIGILL signal to be delivered to the SBCL process. This works reliably, but causes the MacOS X on Intel code to differ from other Intel-based platforms.

Finally, Apple's crash reporter doesn't realize that we might be using memory protections and the associated SIGBUS messages for non-crashing, expected behavior and generates a crash log message or, depending on the Crash Reporter preferences, a dialog message to appear on the screen.

These issues are enough to motivate me to consider using Mach exceptions instead of or, more likely, in addition to POSIX/BSD-style signals.

So, now that SBCL works on MacOS X/Intel, and given that MacOS is proving to be rather recalictrant when it comes to running a threaded version of SBCL, I thought I would share my list of the top 10 reasons why the life of an SBCL developer is made needlesly difficult by the current state of MacOS X.

No OS Source. The source to Darwin used to be available. Apparently that is no longer the case, at least for the Intel version. This is rather unfortunate. When developing for Linux one can dive down into the source to see if, for instance, user provided thread stacks are available to be freed after a pthread_join or not. (Note: in this case, the source to the pthread libraries is in fact available, but, given the design of MacOS' Mach kernel, this is mostly glue around calls down the to the Mach thread layer, the sources to which are not available).

GDB can't step across an EXCBADACCESS/SIGSEGV. SBCL makes rather extensive use of signals (or, in Mach parlence, exceptions) in the "this is a somewhat unusual, but also expected event and should be appropriately dealt with"-sense, not the "this is an error, most likely caused by programmer error or system failure, maybe you should print an error message before you quit"-sense. One critical example of this is the use of memory protection to trigger a SIGSEGV (or SIGBUS depending on the archictecture) to inform the system when bits of memory are being written to. This is a normal event and it triggers a Mach Exception (EXCBADACCESS) that GDB cannot step across, attempting to do so just causes the event to be refired. Setting gdb to "handle pass noprint" this type of exception just causes GDB to hang there. This makes GDB basically unusable, except for certain cases where you can attach to a running core, which will then proceed to work until an EXCBADACCESS is triggered again.

INT3 traps are not reliably delivered. Another example of where a signalling mechanism is used to handle slightly unusual, but totally expected, cases is the use of the INT3 trapping mechanism. SBCL uses this for error handling and, especially, as part of the mechanism for achieving fast atomic operations without having to go to the kernel for a lock. INT3 traps basically work on MacOS, except when they don't, which is about 2-5% of the time. Basically the trap signal is just lost and is not reliable delivered to the signal handler. This causes all sorts of problems for a system that expects to get these traps and was the source of major headaches in the MacOS X/Intel porting effort.

seminit is not implemented. semopen is implemented, but this takes a pathname and is a much more expensive call than creating an anonymous semaphore. The ironic thing is that the underlying Carbon and Mach APIs do support anonymous semaphores and they machinery to associate file system path names to Mach semaphores (one presumes) for use with semopen. It would seem trivial to support seminit.

The mach semaphores are a private API. Here's a useful API for doing semaphore stuff, but for some reason it's a private API. One can use Carbon semaphores, and can link in the Carbon framework, but this seems rather unneccessary.

Problems freeing a thread's stack(?). If I provide a stack for a thread to use with pthreadattrsetstack or pthreadattrsetstackaddr, I see major problems with the threads test suite if I free the stack. If I don't free the stack, the test suite is happier, until the kernel panics. See below.

Kernel panics. I have seen quite a number of kernel panics caused by (one presumes) use or misuse of the pthreads and semaphore APIs. I could understand it if this were a KEXT or even a root process, but these are all user processes. No user process should be able to so easily cause kernel panics.

No futexes. MacOS has a whole bunch of different locking APIs, none of which seem as nice as futexes. It would be great if the kernel (or a KEXT?) could provide futex support for MacOS. I'm assuming that it's not just SBCL and that other language environments, databases, and other highly-concurrent applications will take advantage of futexes and their efficient locking properties on Linux. It will be a shame if those applications are relegated to only using pthread condition variables and mutexes on MacOS X.

No POSIX RT signals. The POSIX RT signalling stuff provides for much more reasonable behavior of the delivery of signals than the original POSIX stuff. We have had to jump through hoops to get the threaded version of SBCL as far as it is without RT signals. It would be great if future versions of MacOS X supported RT signals (I would be happy to trade Spotlight for RT signals, although I have a hard time seeing a guy in jeans and a black turtleneck running around a stage talking about how great POSIX RT signals are :) ).

LDTs are not reused even after being freed. This is a rather obscure bug, and certainly one can manange their own set of LDTs, but the LDT API provides a way to return these to the OS, however they are not recycled and one runs out of LDTs after 0x2000 LDTs, or so, have been set up.

Anyway, perhaps this degree of systems inadequacy is present on all Operating Systems and perhaps there is a certain amount of pain in making things work properly on new OS/architecture environments, but when one compares the UNIX guts of OS X to the guts of, say, Linux or Solaris, OS X appears lacking and makes life difficult for the UNIX application programmer. I'm sure there are many benefits to Apple's microkernel architecture, but there's still a long way to go before it catches up to the other modern UNIXes, at least inasmuch as one treats MacOS as a UNIX, which is what is needed for the low-level of a sophisticated, cross-platform language environment such as Lisp in general and SBCL in particular. If these areas were addressed, it would make it easy for SBCL developers (those who use SBCL, not just those who develop and maintain SBCL) to both develop cross-platform lisp tools and to use MacOS X's sophisticated features, such as Cocoa, QuickTime, Aqua, Bonjour, etc... to develop first-class MacOS applications using SBCL.

On the positive side, this MBP is blazingly fast for SBCL development. It compiles all of SBCL in a hair longer than it takes my desktop x86-64 box on linux, and in half the time of a 2x2GHz G5 desktop.

Ok, an unofficial SBCL 0.9.10.37 binary for x86/darwin is available. Hopefully this will be the last binary released before the official 0.9.11 release. The last release had two problems. Some tests were failing due to the way they were being run and SLIME couldn't connect to SBCL due to some changes to cl:listen. Those issues have both been fixed in this release. This should work well enough to tide folks over until the 0.9.11 release, both for running this directly and for building new versions.

Hopefully there will be an official 0.9.11 binary for darwin/x86 on the SBCL home page at the beginning of next month. Until then, this should allow one to build from current sources on x86/darwin without needing to do a cross-compile. Enjoy!

Ok, I think I've got a fix for the stability problems I was seeing with x86/darwin SBCL. At first everything looked great, then I noticed some sporadic failures. Doing things like running SBCL while the system load was high seemed to exacerbate the problem and increase the likelihood of failing with a SIGSEGV. After much debugging, telepathic and otherwise, and thanks to the help of Juho Snellman, Alastair Bridgewater and the rest of the #lisp crew, it became apparent that the problem was that the SIGTRAP handler wasn't reliably being called. I made some test cases that showed this to be the case, independent of SBCL, and that also demonstrated that the problem exists with Mach exception handlers as well. So, now what?

Well, thankfully Andrew Pinski and Alastair Bridgewater both suggested using x86 instructions that would generate a SIGILL instead and using that instead of SIGTRAP. Sure enough, that seems to ensure that the signal handler is reliably called. This means that SBCL on x86/darwin finally seems to work, for real this time. Knock on wood... I'll commit the changes and roll a binary for public consumption sometime in the next day or so.

Thanks to everyone who helped me debug and workaround this problem. It would be great if Apple would consider making sure that SIGTRAP is reliably called when an 0xCC instruction is encountered on x86. I've got testcases if you want them.

Also, if anyone knows how to get at and modify the EIP inside of a mach exception handler, let me know. I suppose digging through the GDB sources should provide some insight.

Experimental support for x86/darwin has been added to the SBCL source tree. No need to use my patch anymore, just grab the latest from source. Of course one has to cross-build at this point. I'll put up a binary release shortly that will enable folks to grab the release and the source and build it themselves without resorting to a cross-build.

Ok, this patch seems to work pretty well. Feedback and more testing welcome.

There's one remaing issue which is that we should consider using a sigaltstack so that signal handlers get a stack that is properly (16 bytes per the ABI) aligned. Currently, we don't do so and this might leave the door open for bad things to happen.

Ok, everything seems to be working now. I'll make a patch later today or tomorrow and hopefully this will hit the tree sometime in the next week. Thanks to Alistair Bridgewater, Juho Snellman, Reaper, Christophe Rhodes, Andreas Fuchs, et al., for remedial x86 assembly instruction, telepathic debugging help and moral support.

Well, it took quite some work, but the generational garbage collector now works on PPC for both MacOS and Linux. The latest and greatest patch can be found here. The patch is to 0.9.9.29, but it should work on any of the very recent CVS versions.

I thought this was basically done a few days ago, but there was a really nasty and hard-to-find bug in the PPC assembly language routines where we were loading a 32-bit constant into a register with LIS and ADDI. The problem is that the ADDI instruction was sign-extending it's argument if the high-bit of the low-word was set. The solution is to use ORI which does not sign-extend. The problem manifested itself in a hosed LRA register, which, thankfully, Christophe Rhodes was able to hunt down.

This was not an easy task, but it was a great way to learn a lot about the internals of SBCL in a hurry. Thanks to everyone who listened to my griping and who chipped in with ideas and debugging help, especially Raymond Toy, Christophe Rhodes, Juho Snellman. And thanks to KingNato and Raymond Toy getting the ball rolling with the initial SBCL work and the CMUCL port, respectively.

So it turns out ASDF is very useful for packaging all kinds of documents in a form that can be easily distributed. With the help of some code that walks the ASDF system definition, I can automatically make tarball releases with all of the source code and other files, configuration files, shell scripts, documentation, images, etc... and have these be available to the user who downloads this distribution without worrying about where the package gets installed, relative paths, environment variables, etc...

This has been great, but I find myself writing a lot of code that looks like this:

This is a URI of scheme asdf with host nil and the path (:absolute "ch-imageio-test" "test" "images" "sunset-lzw"). Notice that this isn't a path in the filesystem, but rather a path in asdf space. The first element of the path (after the :abosulte) corresponds to the asdf system and the other elements correspond to ASDF components so we can find them with the following code:

Raymond Toy has recently added a generational garbage collector to CMUCL/ppc. Unfortunately, CLEM breaks it. Here's a release of the latest CLEM which breaks the gengc. I really need to get anonymous SVN working one of these days.

CLEM

Work continues on CLEM (Common-Lisp Egregious Matrix), my matrix math package for common-lisp. CLEM now has a bunch of macros which generate type-specific methods for various matrix operations so that I can do fast, non-consing matrix math operations like addition, multiplication, etc... CLEM uses the MOP to define a standard-matrix-class which serves as the meta-class for matrix classes. This allows one to define typed matrix classes as subclasses of matrix, with class attributes stored in the instance of the metaclass. So now we have matrices for {u,s}b-{8,16,32}, fixnum, integer, single-float, double-float, float, real, complex, number, and even t, although this last one is probably suspect.

In addition to the matrix types and the standard matrix operations (add, subtr, scalar multiply, matrix multiply, hadamard product, abs, log, etc...), aggregate operations (min, max, variance, sum), CLEM supports discrete convolution, affine transformation and a couple different types of interpolation, morphological operations (dilate, and erode), and thresholding. Most of these are relatively non-consing, although there are probably a few cases that need to be re-written in the new macro scheme.

Overall, it seems to work, but it is rather slow to compile and results in large fasl files. this shouldn't be too big of a problem, as I'd rather have a fast matrix package that took a while to compile than a slow package that compiles quickly.

Speeds are decent. I approach naive C algorithms and get to within a factor of 10 for highly-optimized matrix multiplication hand-coded in assembly. More benchmarking and further attention to things like the size of the blocks for the blocked matrix multiply would probably be a good thing.

ch-image

Using CLEM, I've developed a trivial image representation/manipulation package. I should probably follow the lisp tradition of calling this trivial-image except that this isn't really trivial, as it requires CLEM. I've thought about trying to abstract away the core matrix stuff from CLEM into a package that works on arrays, and then trivial-image could use that (trivial-matrix?), but I'm getting sidetracked. ch-image supports images of a few types including ub8, rgb8 and argb8. Adding additional types should be straightforward and I hope to work on this soon.

ch-imageio

ch-image is nice, but rather useless if you can't get images into and out of it. This is where ch-imageio comes in. ch-imageio does the conversion between either files in various formats, or from the results of other file reading packages that load the image files into their own format. Currently, ch-imageio supports the following: 1) reading and writing JPEG via the cl-jpeg library, 2) reading and writing TIFF files via libtiff using interfaces defined by gcc-xml-ffi, and 3) writing PNGs using Zach Beane's SALZA library.

gcc-xml-ffi

gcc-xml-ffi generates FFI definitions from C (and C++ code, sort of) using gcc-xml-ffi. Currently, it spits out sb-alien definitions, although the original incarnation did UFFI. In theory, other backends, like CMUCL's alien interface and CFFI should be fairly trivial, but I haven't gotten around to it yet. UFFI isn't really adequate here as it doesn't support callbacks.

ch-asdf

To facilitate generating xml files from gcc-xml and for generating common-lisp FFI declaration files from gcc-xml-ffi, I've made some extensions to asdf. I've also aped the SBCL asdf extensions for dealing with unix-dsos. This is a straightforward thing that everybody seems to do, but the less-obvious part, for me, was how to package them up in such a way that these extensions can be used by other asdf systems. ch-asdf was my attempt at solving this and allows me to add :unix-dso components without having to redefine what a unix-dso is and what is load and compile methods are in every asd file. Also, I can declare a component as a gcc-xml-c-source-file and the right things happen. Dependencies are sort of tracked, but there are a couple places where things break down. In any event, it makes writing asdf systems for packages that use gcc-xml-ffi and unix-dsos much easier.

tiff-ffi

tiff-ffi uses gcc-xml-ffi (and ch-asdf) to wrap the libtiff library. There are also some rather trivial glue functions that help facilitate things a bit. ch-imageio uses this to read and write TIFF files.

carbon-ffi

carbon-ffi allows one, in theory, to develop native Mac OS applications using SBCL. This sort of works and I have some screenshots of simple apps. I've even built "package"-style apps that allow for double-click launching. The negatives are that this is Carbon only (no Cocoa) and that in order to make this work one needs to use some undocumented Apple API functions. This proved to be a huge pain as getting this to work properly exposed some issues with SBCLs stack alignment where we were not properly aligning the stack on a 16-byte boundary, which was making AltiVec rather unhappy.

quicktime-ffi

Similar to the carbon-ffi, this is an FFI wrapper for quicktime. This also suffered from the stack alignment issue, but these are now fixed. Now one can use the QuickTime API from SBCL directly to make and read movies, etc... I havent't tried the GUI stuff (QT movie playing functionality, e.g., but it should work).

congeal

I have implemented a version of the congealing algorithm in common-lisp. This uses clem and the various image stuff to learn a set of transforms that bring a stack of images into registration and can learn the shape of the item represented in the stack of images.

clsr

I have begun to connect SBCL up with R so that I can evaluate R expressions from common-lisp using the R C API and can get the results back to lisp. Next steps are to have a representation of lisp data objects in R and vice versa so that I can, for instance, call lisp functions from R and to connect up the R plotting stuff so that I can make nice plots from SBCL. This is an area where a clean common-lisp API that wraps the R graphing APIs might be a nice thing.

ch-photo

ch-photo is a library based around FFI definitions to the dcraw package to read NEF and DNG files (come to think of it, this should be rolled into ch-imageio, but that hasn't happened yet). In addition, ch-photo contains some scripts for importing RAW files and for organizing them into file system heirarchies based on date of import and file type.

fftw-ffi

fftw-ffi is a wrapper for the fftw (Fastest Fourier Transform in the West) library. In addition to the FFI stuff, there are routines to translate data between ch-image and fftw compatible representations so that one can use fftw to do ffts of images in ch-image.

SBCL stack alignment issues

In order to get carbon-ffi and quicktime-ffi working properly, I had to fix some bugs with stack alignment on ppc. Thanks to Gary Byers for pointing out the bug after I was at wits end with bizarre results coming back from quicktime due to misaligned stack data being munged by altivec (without complaints). These have no been fixed and everything seems to be OK here.

callback fixes

The SBCL developers have been working on adding callbacks to SBCL. The initial ppc port had a number of bugs that caused problems with non-32-bit arguments, long longs, mixing arguments of different sizes, etc... Raymond Toy fixed a bunch of these problems for CMUCL and I ported these over to SBCL. After the initial round of fixes, there were some more problems with how the arguments got pulled off of the stack in the lisp trampoline, but these have been fixed as well. These patches have not yet hit the tree, but hopefully this will happen after the 0.9.8 release.

SBCL sb-alien field alignment issues

Finally, in order to make the carbon-ffi and quicktime-ffi packages work properly, I had to deal with the fact that some of the MacOS toolbox data structures use a bizarre alignment scheme that is a holdover from the m68k days. The bad news is that a lot of these are core structures for things like graphics and IO. In order to support these weird alignments, Apple's hacked up version of GCC has some pragma directives that take care of the alignment issue. Fortunately for me, someone else must have been having similar problems as the CVS versions of gcc-xml started dumping out the offset of structure members. Now that the quasi-compiler was giving me this information, I had to hack up SBCL's alien interface to allow me to use it to specify non-standard alignment of struct elements. This works and now we can properly use these funky MacOS structures.

Ok, that's the overview. One of the main missing pieces is documentation. I need to go back in and document all of this stuff soon. Perhaps that will be my first New Year's Resolution: I well document as much or more code than I write this year.

For both of the other people using callbacks in SBCL on PPC, I've fixed some more callback bugs. The latest patches sent to sbcl-devel fix the following:

Arguments of mixed sizes. Before, we were calculating the offsets of the arguments on the stack wrong. Bascially, we were moving up the stack by the size of the next argument, not the size of the current argument so that if you passed in a double followed by a float, or vice versa, you would get the wrong results.

Callbacks with lots of arguments. Before when we ran out of registers, we would throw an error. This is a mistake. What we needed to do was to copy the registers onto the stack so that the alien-callback-lisp-lambda-wrapper or whatever it's called could get the registers off of the stack. Once we run out of registers, we don't need to do anything as the value is on the stack.

Ported Raymond Toy's CMUCL fixes for calling vararg functions. This isn't a callback fix, but rather an ffi-fix, but it's in roughly the same area. The problem was that we need to put the arguments on the stack as well as in the registers when calling vararg functions.

Speaking of rtoym, there's still an outstanding issue in SBCL that he fixed in CMUCL. One of the ffi param registers is being used as reg_FDEFN. This should be moved down one, I think. I'll look into this after the callback changes hit the tree.

Thanks to rtoy for the cmucl fixes. I think the new fixes are going to affect CMUCL as well, so hopefully he'll find time to add those in, assuming he's not too busy with the gengc port! :-) Go rtoy, go! Thanks to luis and the cffi crew for motivating test cases and to lemonodor for complaining about the too many args thing.

Once again, here are some more new releases. ch-util and clem today. clem, in particular, has some MOP fixes that allow for inheritiance of meta-class options from superclasses that makes it easier to define matrix subclasses.

Thanks to Gary Byers for pointing out that he had seen similar problems with funky lines in Mac OS windows when the stack was misaligned. For some reason SBCL's stack is misaligned when we try to call a C routine. This patch ensures 16-byte alignment and makes Mac OS windows (and QuickTime movies) look much nicer:

Ok, after a lengthy hiatus, I've been trying to get SBCL and Carbon to play nicely together again. Unfortunately, I seem to be at a roadblock. Here's what I've got:

As you can see, the window has some funky lines on it and the menu bar colors are hosed. This doesn't happen when I run essentially the same code from C. So something horrible must be happening somewhere like Carbon's initialization routines or something. I've used the undocumented hacks to get foreground operation and I've tried faking a bundle a la OpenMCL, but I can't seem to get things to look any better this. Suggestions greatly appreciated.

Here are some new releases of my matrix and image packages. Notable new features include (fastish) affine transformations and discrete convolution, various code cleanups, a bug fix in gcc-xml-ffi where I wasn't allocating space for the NULL at the end of C strings, etc... Enjoy.

(Note that the iamgeio download is rather large as it now contains some sample images.)

(Furthermore, note that I just bumped the version # for clem to fix an asdf problem and some code that was inadvertently commented out.)

Ok, it's time for me to bundle up some of this stuff and ship it out, in case anyone is interested. What is it, you ask? Why, some nascent libraries for image processing, matrix math, image i/o, interfacing with C libraries via gcc-xml, etc...

Some of this was released yeseterday, but there are brand new versions of all of this stuff. Comments/suggestions appreciated. Oh yeah, all of these pretty much require SBCL at the moment. I'd like to resurrect OpenMCL support and support CMUCL in the future.

For those of you have run into the 128M dynamic space limit in SBCL on PPC, the following patch provides for a 768M heap. It's rather trivial, and I imagine most folks who have run into this limit can come up with something like this or better, but nevertheless, here are the parameters I've been using successfully for the last few weeks.

I'd like to see something like this rolled into the main SBCL repository at some point, but until then, enjoy, or at least let me know if you have a better approach. On a related note, porting the gencgc to ppc (or migrating rtoy(?)'s CMUCL version of it to SBCL) would be a great project, but unfortunately I'm not sure I have the skills and I definitely don't have the time at the moment.

After spending a couple of days trying to get things back to normal after upgrading to Tiger (trying to sort out compiler versions, fink packages, R and bioconductor, etc...), it occurred to me that the upgrade process for SBCL and OpenMCL has been a breeze. I haven't upgraded to the latest bleeding-edge OpenMCL as I've been using SBCL for most of my work these days, but I understand it runs and OpenMCL 0.14.3 seems to still be running OK. SBCL builds and runs great with Tiger. Thanks to Gary Byers, Brian Mastenbrook and the rest of the OpenMCL and SBCL development teams who made sure that the Tiger upgrade would be a snap, at least from the (or should I say my) open-source lisp point of view.

Now, back to figuring out which combination of the fortran and C compilers (and which libraries) will actually compile R and successfully install Bioconductor...

LTU has a piece on a special issue of JStatSoft on the demise of LispStat. I haven't read through all of the articles yet, but it does seem clear that lisp-stat isn't much of a player these days and that R is the lingua franca of the statistics world. This isssue is timely for me as I have been working in R on and off for the past few months and while I have come to appreciate some of the features of R, I definitely miss many, many nice features of common lisp when using R. S and R are shining examples of Greenspun's Tenth Law in practice. But at least the R guys were smart enough to build R in C by implementing a lisp-like intermediate language. Nevertheless, there are a ton of things in common lisp that are either just starting to make their way into R or are way off on the horizon that the R community will need to grapple with eventually. Then there's the whole issue of completeness, standardization, maturity of the language, etc... But, in any event, it does seem to be the case that R has taken over the, academic at least, statisticl computing landscape.

But, I think it may be too early to write lisp off for statistics off just yet. I'm just delving into the land of lisp-stat, but it doesn't seem to me that lisp-stat is the end-all, be-all of statistics in common lisp. Just as the R and S guys no doubt learned from lisp-stat, I'm sure there are lessons from R that could be applied to a lisp-based statistical package. Witness matlisp as an example of a lisp-based package inspired by matlab. Speaking of matrices and linear algebra, and granted I need to catch up with the past 15 years of progress in lisp-stat, but it seems that lisp-stat wasn't designed with what I view as modern statistic in mind. When I think of modern statistics I think of big matrices and methods to operate on them efficiently. The main focus of lisp-stat seems to be on dynamic graphing tools and matrices and linear algebra get 5 or 6 pages in the orginal book. I'm sure this may have changed, but in R, as in matlab, the focus seems to be on efficient matrix math from the get go. As for the dynamic graphing stuff, yes, I can see this being nice, but I see this as more of an "add-on" thing than a core feature. A core feature (or a damn important library) of any statistics package, on the other hand, needs to be high-quality print graphics. It's great to be able to spin and zoom and all that, but ultimately scientists want to present their data for publication. R has 1) a good interface to BLAS/LAPACK (with decent notation), 2) great graphing facilities (even if support for png, jpeg, etc... are a bit janky and require either X or ghostscript), 3) a ton of in-depth (if not complete) library packages for doing a little bit of everything.

I think one of the major things holding lisp-stat back was the lack of good freely available common lisp implementations. The lisp-stat book claims that (at least in 1990) common lisps were expensive and that the free (subset of) common lisp available didn't have a compiler. Obviously things have changed a great deal since then. I haven't been around long enough to know the history well enough to do all the previous implementations justice, but I can say that first CMUCL and later SBCL have done an amazing job of bringing a fantastic common lisp implementation to the masses. All that aside, I still think that lisp-based systems have a deployment problem. If you look at successful lisp based systems, I'd argue that most of them include their own (perhaps half-assed or limited) lisp implementations, emacs and autocad being cases in point. One of the advantages of emacs and R is that they both just build and run out of the box. There isn't this whole build the language, get the libraries, build them, get the application, build it, get the app. libraries, build them, etc... rigamorole. Granted R libraries aren't necessarily the cleanest thing in the world, but the whole R CMD INSTALL thing is easier, for some reason, for new users to grasp than the asdf install process.

Nevertheless, I'm still hopeful that the emergence of high quality, freely available lisp systems, and the success of at-least-partially-lisp-inspired (at least in the guts) stats packages such as R suggest that there's still hope for doing linear algebra and statistics in lisp. I'm looking forward to trying to do some of what I currently do in R in common lisp in the future.

So in my continued adventures in ffi-land, I decided it would be an interesting exercise to make a MOP metaclass that provided for transparent access to FFI structs. I took a first stab at this a while back using OpenMCL and it kinda worked. As part of my recent efforts to switch over to SBCL, I decided to pick this up again and see if I could make this work with UFFI. Sure enough, it works! And I've cleaned up the code a bit to create a package called meta-ffi which provides the MOP infrastructure and another package called vimage which uses the GCC-XML stuff and my GCC-XML-UFFI declarations to automatically parse the vImage (a MacOS framework for doing HW accelerated image manipulation) headers, generate an XML file of the appropriate declarations, generate a CL file with UFFI declarations and finally define a CL class which provides for transparent access to underlying structs. It's an awful lot of machinery, but it should be reusable in other contexts and might help write CL code that uses foreign libraries. Clearly this can be done without all of this machinery, but the point was to try to automate as much as possible. Now I can do something like this:

The MOP stuff under the covers takes care of the foreign struct allocation and making sure that slot-value and setf (slot-value... do the right thing.

I think this is kinda cool, but I can't decide if it's really useful or just Brucio -cool. Time to do some more hacking to see if this really helps. I think it could prove useful for the development of Carbon apps in SBCL.

I'm sure there are more pressing matters in getting SBCL to 0.9, 1.0 and beyond, but I'd like to suggest that the SBCLHOME/INSTALLROOT thing is too fragile and should be fixed. To require that INSTALLROOT be set at build and install and runtime seems overkill. I would prefer to see SBCLHOME/INSTALLROOT be set at build and install time, leaving an sbcl executable that doesn't require the setting of SBCLHOME. I think this will help make SBCL useable for new users. I understand that 1) this is a relatively trivial bug, 2) there is a number of simple (including some currently documented) workarounds and, 3) there are more important things to work on, but, nevertheless, I think it would help adoption of SBCL is this were to be fixed.

Ok, the DSL outage has been fixed. Now I can get back to work. I always think I'll be more productive without a net connection, but it's never the case.

In any event, the gcc-xml work continues. One problem is 64-bit support. C long long should be useable as args and return values, but UFFI doesn't support this. I'm not quite sure what to do here. Perhaps using SB-ALIEN and other native FFI interfaces might be appropriate for 64-bit stuff. It doesn't really matter if everything goes through UFFI or not, as long as it works, and it's probably easier to stick the 64-bit support in here directly rather than trying to convince KMR to support 64-bit types in UFFI, which seems like a losing battle. Might need some special C glue support too, which probably has no reason to be in UFFI.

A second problem is dealing with the huge number of declarations generated. Clearly processing the XML file and generating the lisp decls every time is not a good idea (nor is generating the .xml from the .h/.c every time) and compiling the lisp forms is probably not a good idea either. Loading a compiled fasl helps, but still yields a 30-60 second delay. Saving a lisp core might work, but I understand there are some issues WRT lisp cores on foreign libraries and I haven't worked through these yet. As a step in this direction, I tried save-lisp-and-die (SBCL's dump custom lisp core command) and things failed with some a 65536 is not an UNSIGNED-BYTE 16 error. Changing the element type of the compact-info-entries-index (or something like that) to be 32 bits fixed the problem, but I have no idea if this is correct much less wise, although it gets me one step closer to trying it out. More in a bit...

Work is continuing on generating UFFI-based lisp bindings to foreign libraries via gcc-xml. Now I can successfully parse all the GCC-XML declarations from Carbon.h. This (and some apparently undocumented (at least not officially) MacOS hackery) enabled me to do this:

So I've been working on automatically generating UFFI declarations from GCC-XML and it's almost there. The idea is that one feeds a C file (.c or .h) to GCC-XML which outputs an XML file containing XML elements for the declarations in the file, including function defintions, typedefs, structs, enums, etc... I feed this file into some code which parses the XML file and spits out a file of UFFI declarations, which is then eval'ed to produce the appropriate UFFI types and what not. This should make it easier to port code that uses OpenMCL's interface database to SBCL. Don't get me wrong, the interface database is very cool, but it is OpenMCL specific and it would be very nice to be able to develop Carbon applications, e.g., on SBCL. And there's nothing OS X specific about this; it should work on any platform with GCC-XML and a modern Lisp implementation supported by UFFI.

There are some potential issues:

Names: How should lisp names for functions, etc... be constructed? Should we attempt to map things like SuperCoolOpenGLBuffer_Fast to super-cool-open-gl-buffer-fast? Or should we just intern the C name yielding a function call like this (|EnableOpenGLMonkeyButter| ...) or whatever? I'm leaning towards the mixed case symbol approach which means that the symbols have to be enclosed in vertical bars, but I'd arguet that this is a feature as it makes more obvious which symbols refer to external declarations.

Packages: To which package should the C declarations go?

Callbacks: For full functionality of many C APIs we're going to need callbacks. My understanding is that SBCL still doesn't support them. Hopefully this will get fixed in the near future.

Anyway, hopefully I will have a release of this up in the next week or so. Or send me email if you're interested in seeing this sooner rather than later.

Dan Barlow has an interesting piece on walking (reading and writing) nested data structures, motivated by the ease of which such things are done in Perl. Being a good (great?) lisp hacker, dan_b wants to make this kind of thing easy to do in lisp suggests a nice setf expander that works with plists to allow for setting values in nested data structures as follows:

(setf (ref l :foo :ban :barry) 17)

This is very cool but dan_b is partial to plists and so he is implementation works on property lists. This is fine for small lists (even deeply nested ones, as long as there aren't too many keys for a given hash-table), but once there's a bunch of key/val pairs in one of the hashtables, this will be slow to search this.

One of dan_b's motivating factors for using plists is that hash-tables aren't read-printable, which is true, as hash-tables contain things that are difficult to externalize, such as the :key. I'd suggest that to externalize a set of nested hash-tables, one could just make a plist from it as follows:

And do the inverse of this on the way back. Of course, one has to make some limiting assumptions about the nature of the hash-table, such as the the test has to be equals if we want this to work with sting keys (not just keywords) as most things that could use this sort of machinery would want string keys.

Of course the setf-expander has to be rewritten to work with hash-tables, but if one were to do this and pair it with a way to externalize the data as a plist, I think you could get the performance of hash-tables (this is both good and bad as there is most likely a performance penalty for small hash-tables, but a big win for large hashtables) and get at least some externalizability that is no more limited than what you would get with just a plist approach. Think of this as hash-tables under the covers, with the canonical external representation still being plists (with the further caveat that the order will most likely get munged, but who cares, the assumption is that this is for unordered data sets).

The tricky part is writing the setf-expander. Well, turns out it's not so tricky after all, as danb has already done all the hard work for us! Here's the modification of danb's setf-exapnder to work for hash-ref:

[I was going to send this to sbcl-devel, but I've already pestered the folks on the list with a couple trivial matters this week and I'm sure the folks have better things to do than listen to me whine about perceived (by me) weaknesses in SBCL without a patch to address them. So, here goes...]

I'm sure there are more pressing matters in getting SBCL to 0.9, 1.0 and beyond, but I'd like to suggest that the SBCLHOME/INSTALLROOT thing is too fragile and should be fixed. To require that INSTALLROOT be set at build and install and runtime seems overkill. I would prefer to see SBCLHOME/INSTALLROOT be set at build and install time, leaving an sbcl executable that doesn't require the setting of SBCLHOME. I think this will help make SBCL useable for new users. I understand that 1) this is a relatively trivial bug, 2) there is a number of simple (including some currently documented) workarounds and, 3) there are more important things to work on, but, nevertheless, I think it would make life easier for folks who are new to either Lisp or SBCL.

It works pretty well (i.e. it passes the few simple tests I wrote and works in the limited situations in which I tested it), but is currently not used at all by the oodml stuff, which would be nice, as it offers the potential to greatly speed up repeated accesses to the database

I'd like to get this stuff cleaned up and integrated into the CLSQL source tree, but it hasn't happened yet. Hopefully soon.

SBCL's sb-bsd-sockets and sb-posix packages directly export a number of symbols, instead of using the :export cluase in defpackage. This seems to cause a problem on subsequenct recompilation of the defpackage.lisp file as the symbols are exported by the package but not in the :exports clause. This patch fixes things these two packages.

Christope Rhodes kindly pointed out that the consquences of calling defpackage on existing packages that have been modified are undefined. This suggests that my proposed patch is probably the wrong way to handle this. Xophe suggests the following:

I've given up trying to make SBCL take characters on byte streams. The spec is unclear as to what should happen here, and it would be really nice if we had a binary stream to which one could write-char, but since a reasonable interpretation (the one taken by the SBCL developers) is that this is a bad idea and shouldn't be allowed, I've decided to byte the bullet and look for other ways to solve the problem. There were two different places were the problem cropped up for me. First, I was trying to hand off the araneida stream to the cl-jpeg library which was trying to write-byte and write-sequence (of bytes) to this stream. This used to work and worked in OpenMCL, but didn't work in Unicode-enabled SBCL. Turns out, CLHS says this is a illegal. Ok, fair enough. So I figured I'd make the http stream a binary stream. This didn't work as araneida tries to read-char and write-char to the stream and the spec is unclear on how read-char and write-char from/to binary streams should be handled. So I figured I'd fix SBCL stream implementation to allow read-char and write-char for binary streams. This broke araneida's static file handler, which was checking the type of the stream, so disabled that and things were working fine. But, the problem is that isn't necessarily portable, character set issues aside. So I figured I'd try to make all of this work from a stock SBCL build.

Turns out the araneida things works, but in order to make the jpeg library work, I had to change the write-bytes to (write-char (code-char ...)) which seems to be working and similarly I had to change the write-sequence calls to loop over the sequence and write characters. Not the best solution, but probably better than the alternative. I'm still not sure what effect, if any, the environment variables LANG or LOCALE have on this, but things seem to be working for the moment.