Sunday, 30 December 2007

When a language is pre-compiled (like C or C++) there is a step of compiling and linking the source code into a form that can be executed directly on the target hardware. The assembling of source into an executable form is usually handled by a build system like SCons or Make.

The CPython implementation of the Python language uses an interpreter. There is no compile step involved (it is implicitly compiled to bytecode).

Common Lisp implementations are somewhere in between in that there is a compile step but you can still use them as interpreters. For example, SBCL creates these files called FASLs which seem to stand stand for "FASt Loading". Their use should be self-explanatory. The formats are implementation-dependent meaning that they are not portable between implementations (not like .class files for Java, for example.) I couldn't figure out what SBCL stores in it's fasls but my guess is that it is a bytecode.

While developing a library, it is useful to have an interpreter-like environment. However, when you are using a library, you don't want to load and recompile the source every single time. What you want to do is compile the library's files into FASL format and have your implementation load them when necessary.

That is where ASDF comes in. ASDF, which stands for "Another System Definition Facility, is a way to define your projects so that they can be compiled and loaded along with whatever they depend on, in the right order. The rest of this post covers what I did to get a project and a testing system setup. Installing ASDF is out of scope here but if you use SBCL, you already have it.

ASDF looks for .asd (a system definition?) files in whatever paths are defined in the list asdf:*central-registry*. So the first thing you want to do is add the above path to the list. The way I did it (which is not optimal) is I modified ~/.sbclrc to include the following lines:

This uses the standard defpackage macro to define a Common Lisp package. A package is a mechanism to map symbols to names. The above exports a symbol called some-function. Next, open up src/code.lisp and enter the following:

(in-package #:project)

(defun some-function (a) (format *standard-output* "a is: ~A" a))

This uses the standard in-package macro to set the current package to be project. If you are using Slime, you can try loading code.lisp (C-c C-l) and if you haven't loaded package.lisp, you should get a message telling you that project does not designate a package. To make it work, load package.lisp and then code.lisp. In a large project, it would be impossible to remember which order to load things in.

The next thing to do is to make this loadable via ASDF. To do this, open up project.asd and enter the following:

I prefer using a module because if you have multiple modules that have dependencies, the dependencies are easier to define. For example, you might have a "model" module that depends on the "database" module.

Now go to the REPL and type (asdf:oos 'asdf:load-op #:project). Assuming you set up asdf:*central-registry* as above, you should get output like this:

Even though we told ASDF to "load" the system, since we hadn't compiled the files, ASDF compiled them for us, in the right order. Type (project:some-function 5) if you want to convince yourself that it worked!

Now we want to add a package to test our code. To do this, open up project-test.asd and enter the following:

The only new thing here is the use of the :depends-on keyword argument to defsystem. Here, we are telling ASDF that before loading/compiling project-test, the project system must have done so successfully.

Mechanically, we add the following to test/package.lisp:

(defpackage #:project-test (:use :cl) ;; Could also use the project package but I like to qualify symbols (:export run-tests))

Thursday, 27 December 2007

I spent all day today adding some tests as I needed to do some refactoring. I wanted to choose a CL testing framework and came across Phil Gregory's great post: Common Lisp Testing Frameworks. I read through his review and as you can tell by the title, I chose to go with FiveAM. I won't go over what Phil covered in his post except to say that FiveAM has what I expect in a testing framework and more.

However, the grouping is only useful for selecting which tests to run. You can't (for example) make one set of tests dependent on another set of tests. This makes the feature only useful for organizational purposes. It isn't a deal-killer especially since you can write a function to work around this limitation like the one below:

The for-all macro takes a list of generators and iterates through a set of samples for all the represented values. This is done through the use of generators (gen-string and friends.) In this case, I am iterating through a distribution of strings that generates a string between 5 and 10 characters long the contents of which are in the "interesting" ASCII character range. The body of the for-all macro is dedicated to encoding the password and validating that the encoding is sane. Although it isn't important, validate-encoded-password looks like:

Sunday, 23 December 2007

With any application in general, it is important to ensure that the user is allowed to use the system. This is known as authorization. The first step to authorizing a user is to authenticate the user or ensure the user is who they say they are. Once the user is authenticated, then the application can decide what operations/views the user is authorized to use. The de facto standard way of authenticating is by forcing the user to input a user name and password. <rant>I personally hate this.</rant>

With a stateful application, such as a desktop application, authentication is pretty straightforward: just authenticate at application launch.

Unfortunately, HTTP is stateless (keep-alives aside.) Continuation-based frameworks such as Weblocks totally remove the problem by allowing you to write your app as if it were stateful. It is quite beautiful. Continuation-based frameworks have their own uses but if they don't fit your needs, then you need a different approach.

The usual way to implement authentication for a web application to check if the client has been authenticated on each page request. Obviously, this is quite annoying if you have to do it yourself. Frameworks like ASP.NET handle this for you (\o/ frameworks) by some careful editing of XML files. As I understand it, you tell ASP.NET what resources are protected and how to authenticate when protected resources are accessed. You fill in the authentication blanks with some helper classes that MS wrote for you. ASP.NET then generates a random session ID and sets a cookie which is used in subsequent requests to the web application. Nothing special. This is wide open to MITM attacks if you don't use SSL or sufficiently secure your session ID.

Hunchentoot does none of the above but it has all the ingredients to make it possible. The idea is that we want to intercept every request and check if a protected resource is being accessed. If a protected resource is being accessed, then we need to either force authentication or pass the request along if the session is authenticated. One way to do this is to insert the appropriate code in every page using macros. Another way is to use Hunchentoot's dispatch table. Personally, I'm partial to the method covered here because it doesn't require you to remember to secure your pages. Another benefit with this method is that you can also protect non-function resources such as when you serve static content.

When Hunchentoot receives a request, it iterates through hunchentoot:*dispatch-table* executing each dispatch function. Each of these functions is meant to return a function that will serve the request when applicable. Therefore, Hunchentoot executes the first such function returned while iterating. Hopefully the solution that I am thinking of is clear: insert a dispatch function that gets called before all others that checks for an authenticated session and redirects to a login page if one is not found. Here is an example of such a dispatch function:

In real life, you would obviously not automatically log someone in and instead have the regular username / password form.

Further, to prevent session hijacking with Hunchentoot, you need to do atleast the following:

Use SSL (HTTPS)

;; See Hunchentoot documentation for the meaning of these variables(setf hunchentoot:*use-remote-addr-for-sessions* t)(setf hunchentoot:*use-user-agent-for-sessions* t)

You need to also redefine hunchentoot::get-next-session-id because it uses sequential session IDs which leaves you open to guessing attacks. Imagine an attacker logs in just before you and knows that the next session is N+1. Not fun.

The above method for authenticating a user is secure so long as the above three are implemented. I haven't done the third yet so I couldn't say what to do there. I think you need to generate a (theoretically) truly random number somewhere.

Update: You do *not* need to redefine hunchentoot::get-next-session-id. It turns out that information was made on a bad assumption that all the information going into the session id string was deterministic. On reading the code more, there are two elements of randomness inserted into the session string:

The session start time

A random string generated once per server

I think the above two are sufficient to make it secure for some value of secure. I believe the secrecy of the random string is important to the security. But I am no security expert!

Friday, 21 December 2007

So Google has had this AJAX API out for some time. For the longest time, I've wanted to be able to apply regular expressions to the results. I figured using the Google API would be the best way to do it. That is the page linked to above. View the Javascript, it is pretty straight forward (and likely buggy!)

The API itself is pretty good. The only thing that bugs me is that I could only manage to get 8 results on which to apply my regular expression. Ideally, I would have liked to do something like this:

Get query + filter string from user

Submit query to Google using AJAX

As the results come in, filter them using the filter string in step 1

Present to user as results pass filter

So for the above page to be even slightly useful, I would have to be able to go through at least 100 relevant results.

Of course I understand that Google has limited the search for business reasons. But this is a very good way to totally make the feature useless, in my opinion.

Tuesday, 18 December 2007

I am not quite sure what REST is but I know that following REST practices gives you URLs like http://www.mycompany.com/resources/94182. Pretty isn't it?

Django has a URL dispatcher which allows you to specify regular expressions that match incoming URLs and call a specific handler. If you wanted function resource_page to handle requests to URLs similar to the above, you would specify /resources/\d* as the regular expression. The slashes at the beginning and end of the regular expression are optional. Effectively, you are writing /?resources/\d*/?. This would match:

/resources/

/resources/123

/resources/123/

I'm not entirely sure that it would match /resources// (try it to see!)

This would not match /resources/abcd and other similar URLs.

In Django, the handler is written as:

def resource_page(request,resourceid): # ....

And the dispatcher is registered like:

urlpatterns = patterns('', (r"/resources/\d*",resource_page),)

A little thing I forgot to mention was that if you want the matches to be bound to function arguments, you have to make sure the regex remembers them. In this case, one would really want to write:

urlpatterns = patterns('', (r"/resources/(\d*)",resource_page),)

As is my nature, I decided I want this functionality as a Hunchentoot handler. Hunchentoot works pretty much the same way except you only specify dispatch handlers. When a request comes in, Hunchentoot iterates through the dispatch handlers and if one of them returns a function, that function is called to handle the request otherwise the default handler is called.

So the solution is obvious (I think!): I want to write a handler that matches the requested URL against a regex, binds any matches to function parameters and returns that function.

What I want to write in my code is something like:

;;; I like to be explicit about the slashes myself(create-regex-dispatcher "^/resources/(\\d*)" #'resource-page)

This is obviously very simple once you have a regex engine. Fortunately, not only has Edi Weitz made a billion other libraries including Hunchentoot, but he has also written one of the fastest regex libraries, surpassing even Perl. Crazy.

Anyway, here is the code:

(defun create-regex-dispatcher (regex page-function) "Just like tbnl:create-regex-dispatcher except it extracts the matched values and passes them onto PAGE-FUNCTION as arguments. You want to be explicit about where the slashes go.

Monday, 17 December 2007

In this last post, I showed a way to implement Python's decorator syntax in Lisp which actually seemed to work for more than just myself!

What I did not show is how you can use this in regular source files. As mentioned previously, one way to add new syntax into Lisp is to tell the reader (via *readtable*) to call a reader macro when it encounters a particular pair of characters. So the answer to using this syntax in regular source files is to locally enable it by rebinding *readtable*. As always, it helps to write out what you would like to do:

(enable-py-decorator-syntax)

#@my-decorator(defun my-function (x) ... )

#@my-decorator(defun my-other-function (x) ... )

#@my-decorator(defun yaf[yet-another-function] (x) ... )

(disable-py-decorator-syntax)

Quite simply, enable-py-decorator-syntax copies the current readtable and sets the dispatch function. It also assigns the original readtable to a variable so it can be reset. Conversely, disable-py-decorator-syntax does the opposite: it sets the current readtable to the original readtable and sets the auxiliary variable to nil. Without further ado, the code for these functions:

The Lisp solution is more flexible, although that flexibility (being able to use lambda functions) is probably unwarranted.

The fundamental component of program compilation or interpretation is the Lisp reader. It is responsible for parsing representations of objects producing objects. So when an object has a non-readable representation, that means it cannot be reconstructed in this manner. For more information on the algorithm, see the relevant ultra hyperlinked hyperspec.

The Lisp reader reads one character at a time from the input stream. Big surprise. The interesting part that makes the above possible is that you can redefine what the reader does when it encounters certain characters. This dispatch information is stored in what is known as a readtable. The current readtable, the readtable being used for dispatch when reading, is stored in the dynamic variable *readtable*. So, to modify the readtable for a subset of code, all you need to do is rebind this variable within that block of code.

The hook into the Lisp reader that I used is set-dispatch-macro-character. Among other parameters, this function takes in two characters and a function to call when the reader encounters these characters. For some reason, I decided that I wanted #@ to be the dispatch pair for the decorator implementation. I suppose I could just as easily have used set-macro-character and dispatched on @. I leave that as an exercise to the reader (if you are still reading!)

So just like when dealing with macros, it helps to write out what code you want generated. In this case, given the input:

Generate a new function that is created by successive application of each decorator function

Simple enough eh? Except when you have more than one decorator, the reader will call your dispatch function recursively. So we must disable that by temporarily rebinding the dispatch character to a simpler function. After this little tricksy bit, the rest is pretty mechanical. So without further ado, the actual code:

Cut and paste into your REPL and have fun with it! If you don't have a REPL, install SBCL for your platform and give it a run. Let me know if it actually works for you, if you try it! :-)

Edit: If you want to play with this as is, the easiest way is to type (test-readtable-thing) into the REPL and use (eval *) to evaluate the output once you take a look at what it generated. You can also use (eval (test-readtable-thing)). I will write a post that shows how to enable it for normal source code soon.

In some version of Python the community reached a consensus that decorators were a useful addition to the language. Decorators were implemented to encapsulate function transformation which usually took the following form:

In the above example, the function foo is modified to ensure it holds a lock first before calling zonk(bar). Unfortunately, this transformation has very poor placement in terms of readability. If foo was very long, it would get lost in the noise as it is at the end. So a new syntax was proposed:

def synchronized(lock); # as before

@synchronized(lock)def foo(bar): zonk(bar)

That placement is a lot better and once you understand what decorators are for, it is a lot more readable than the alternative.

I was reading through some Django code the other day (Reviewboard) and noticed that they were very heavy on usage of decorators. It is quite a handy tool it seems. So I got envious. I wondered why Lisp did not have this functionality. Is it not possible? Do you need to meet for months to put this into the Common Lisp standard? Thankfully, the answer is no. I will cover how I arrived at this in Part 2, perhaps later today, but here is the equivalent Lisp code:

It's not yet certain that class decorators will be incorporated into the language at a future point. Guido expressed skepticism about the concept, but various people have made some strong arguments [28] (search for PEP 318 -- posting draft) on their behalf in python-dev. It's exceedingly unlikely that class decorators will be in Python 2.4.

Thank goodness Guido will not stand in my way ;-)

Edit: I am fully aware that this is not a Lisp idiom. I just wanted to see if it could be done, more than anything.

Saturday, 15 December 2007

Every red-blooded programmer must splurge now and then on some technical books that have nothing to do with their current work. For me, this is typically vacation and Christmas time. While I am regularly purchasing on-topic business and technical books during the year, this year, I am spoiling myself on:

A bit schizophrenic, I admit, but I love Lisp and Rails seems to be getting used everywhere nowadays. I can't say I disagree with the value proposition of Rails, as I see it:

You know how you always do the same damn thing for every single web application? Well this framework does it all for you in a way that works for everyone.

Nothing like solving non-problems to get you thinking about your real problems!

Someone wise once told me that it is good to get into the habit of purchasing books regularly. Unfortunately, I seem to stick to non-fiction, or atleast factual. I would like to get into more political or historical readings, but don't know any good ones!

Friday, 14 December 2007

This year, BC Hydro is proud to support the BC Children's Hospital in their quest to change thelives of kids in need. Just like with energy conservation, every little bit makes a difference. Each time our holiday card is viewed, we will make a donation to BC Children's Hospital. To view and hear the e-card, visit the following link: http://bchydro.com/holiday2007

This is something that has interested me for the longest time: why do so-called experienced developers claim that their time shouldn't be spent working on the build system?

The best developers I know, know their build systems inside out. The best among these have written their own. Remember though, correlation is not causation. Anecdotal correlation is probably the worst kind though :-)

To me, the build system is like the conductor in an orchestra. It defines the possibilities and the boundaries. It directs what you can do. Deviation from this breaks the harmony and you get a sub-optimal result.

Why anyone would not like to direct the development process in this way is beyond me. I guess the guys whose time should be better spent elsewhere are probably the same guys who complain a lot about things in general.

Companies that invest in a suitable system for builds, even if they write their own, will be much better off in the long run. Of course, you should only write your own once you are making some money!

Wednesday, 12 December 2007

The next checkpoint release of SCons will contain a change that warns users about the unreliability of -j on Windows if the Python win32 extensions are not installed.

The problem is that in a parallel build, if you create/read/modify a file in a Python action, and a command-line action is spawned, the command-line action inherits Python's open file handles and can keep them open, which may cause subsequent failures.

I followed the discussion on the mailing lists and thought it was quite cool how they solved the problem.

Monday, 10 December 2007

The difference between proprietary software and commercial software is subtle. Proprietary software is essentially software that is usually for sale but which the user is not allowed to reverse engineer or modify for any honest purpose. These restrictions are usually laid out in pages of legalese that you either click-through and never read, or you read and die before you finish. Commercial software is software that is usually for sale.

Why do people consciously choose to close their software to the users in this manner? I think there are three reasons:

The code really sucks.

Trade secrets.

Everyone else does it.

I feel that only the first is a legitimate reason. If the only thing preventing you from distributing your source code is a trade secret, then you are on thin ice anyway. Even so, I'm on the fence about trade secrets. If the code really does suck, then distributing the source could really harm your reputation, which isn't worth it. Providing good service and support is the best you can do for your customers in this instance, atleast until you can rewrite it.

The trick is to notice that not all commercial software has to be proprietary. The restrictions do not have to be so onerous that you are afraid to look at an error message for fear of knowing what file it originated from. For my customers, I want them to feel that the software helps them, not restricts them. Open source software is geared towards giving the user of the software the freedom to atleast look at and modify the source.

The license of licenses, the GPL, goes a bit further. It lets you reuse it for any purpose, providing you make your modifications available. This opens up the possibility that someone may compete with you using your own software. This has happened, for example, with Redhat and CentOS (thank you for choosing unique names, it makes analyzing the trends so much easier!) Is that a problem for Redhat? I'm not sure, but their revenues have doubled in the last couple of years. They surely aren't dying and they must be having a good time.

But Redhat is a bit different aren't they? They don't provide a single piece of software, they provide a union of TONS of software. What about the really small guys? I'll just use the term uISV, for Micro-ISV.

Compared to their huge, monolithic counterparts, I think uISVs are different in one very important way: they genuinely care about software and solving hard problems for their customers. And I think here is where having the source available for the users can be important. If one of your customers needs to port your software to the Xbox 360, but you have neither the expertise or the economical inclination to do so, your customer should be allowed the right to do this. Even further, they should be encouraged to submit their changes back to you. Perhaps through some discount-on-next-version incentive program or just simply because then they don't have to maintain their patches.

So my license would allow the customer to:

Use the software

Modify the software for their own purposes

Submit their modifications back to me, if they feel it is beneficial to them

I would specifically prohibit the redistribution of my software in source or binary form because I'm not sure whats in it for me.

I think the above would work for 99% of paying customers. It extends the software support spectrum just a little bit more, which makes it more useful for them, which gives you (possibly) happier customers.

See Up the tata without a tutu by Joel Spolsky for another discussion of this subject. I don't know where he gets these titles from.

Saturday, 8 December 2007

A comment in an earlier post got me to thinking whether the technology used to develop an application is important anymore.

I believe that what really matters is not the length or conciseness of code, but the functionality made available to your users. Yet, correct and judicious application of a technology can make or break your product.

So yes, I do believe the technology behind an application is important. This about sums up my thoughts on the subject:

A million mediocre programmers, pair-programming at five-hundred thousand mediocre computers, using mediocre software will never produce anything that is not mediocre. However, a few excellent programmers, programming on their own machines, using open source software, will eventually produce Linux

Friday, 7 December 2007

As I mentioned, if you want to do model validation with Weblocks, you currently have to write it yourself. But the choice of method to specialize was sub-optimal. Ideally you would want your model to be validated no matter how you were modifying it, say with a gridedit or a dataform. So instead of hijacking dataform-submit-action, I should really have created an around method for update-object-from-request. This is below:

validate-new-login will return multiple values in the same manner as the generic update-object-from-request: a boolean value indicating success and a list of errors, if failure was indicated. It looks like this:

(defun validate-new-login (new-login) "If the Weblocks validation succeeds, then all required valuesare already there, so we only need to check the consistency. Returns(values success errors) where success is t if there were no errors otherwisesuccess is nil and errors is an association list of (slot . error)" (let (errors) (validate-slot "password" (equal (password new-login) (verify-password new-login)) "Both passwords must match!") (values (null errors) errors)))

Which brings me to my next update. Weblocks nicely renders all the errors for you. Unfortunately, I made that part really difficult. Previously, my use of continuations would always create a new dataform which would then not show any errors that needed to be displayed. Since the continuations are stored in the widgets themselves, I should have just yielded the dataform widget itself and be done with it:

I think you're referring to the new merge tracking features of 1.5. By following Subversion-devel, I can tell you that it's not even clear at this point what kind of merge support should be expected in 1.5 and what will be deferred to 1.6 or even later.

Why this is the case is beyond me. Collabnet has had a merge tracking beta for quite some time now. My guess is that they haven't had enough feedback. Regardless, you can count on a quality implementation once the feature is actually released.

If you are currently using Subversion, I think the best choice is still doing whole-tree merges. Cherry-picking using svnmerge.py should only be used for release branches.

With a well-defined process, you barely notice lack of merge tracking. Except when someone screws it up!

Wednesday, 5 December 2007

So one of the things you want to do when you start an application is configure it. Ideally you wouldn't configure anything, but sometimes, at the very least you need an administrator login to be set up. I will talk about how to do just that using Weblocks. I assume you have already installed Weblocks and have created an application using weblocks:make-application.

First, I figured I wanted to store application configuration somewhere. To start with, I decided to just use the simple associative-container that comes with cl-containers (use ASDF to install it.) Then I added a bunch of configuration-related functions that would encapsulate the storage somewhat:

(in-package #:myapp)

(defparameter *config* (make-container 'associative-container))

#|Configuration variables: config-first-config-complete-p: Whether the first configuration has been completed or not|#

(defun load-config-from-file (filename) (declare (ignore filename)) ;; just set some defaults for now (set-config-value 'config-first-config-complete-p nil))

When you called weblocks:make-application, that created the file myapp.lisp which contained the code that starts and stops your application. Insert a call to the function load-config-from-file there with some dummy argument for the filename for now (you will have to fill that in later - hint: use cl-store).

So now you can get and set arbitrary configuration values. The configuration key that I named above, namely config-first-config-complete-p is initially set to nil when the application starts for obvious reasons (hint: it is the topic of this post!)

Another file generated by make-application is init-session.lisp. If you are at all familiar with Weblocks, what this function does is initialize the session for the user connecting. You are supposed to set up a bunch of widgets and let the client have at them.

So unless the first-time configuration has been completed (which is determined by checking the configuration value at runtime,) we return the result of first-time-setup which is obviously where the real magic happens.

I created another file, login.lisp, that I used to keep all the login logic. Right now, it only has the logic for creating a login but you can use your imagination. Anyway, the first-time-setup function looks like this:

When yielding continuations in Weblocks, the continuation is stored in a widget. That is why we need to create the composite widget and use it with the with-flow macro.

When you create a login, the minimum pieces of information you need are usually the user name and the password. Typically, you also need to verify the password. We need to create a widget that will let us do this.

Weblocks comes with a widget called the dataform which nicely wraps up editing server-side data structures on the client. All you need to pass it is an instance of your class, and it generates the appropriate form. Quite nice, if you ask me.

So the data model that I used to store the login creation was unimaginatively called create-login. As you can see, it is a normal CLOS class and there is nothing suspicious about it:

The reason that I gave an explicit type to the password slots of the class was because if we just let them be, then Weblocks renders the textbox representing the password as a text input, rather than a password input. We will need to use the type to override this behaviour.

I defined the password type using (deftype password () 'string).

When a class slot value is rendered to HTML, the function render-form-value is called. As mentioned before, we want to override this behaviour for the password type. We do this as follows:

I love CLOS. Pay special attention to the call to attributize-name. It took me a while to figure that out!

So now, we need to actually create our widget that will let us add a login to our system. Actually, we are already done. The dataform does it for us:

(make-instance 'dataform :data login )

But what if the user just presses submit without actually entering any information? We should rap their knuckles for that, or atleast give them a message. We can use the flash widget for that. Since this will be part of the adding-a-user action, we create a widget that contains a flash message:

We need to call render-widget because the function becomes a widget when you yield it. A little subtlety that I only came across by trial and error (and help from the mailing list of course!) The key thing to note is that we only return from the continuation (i.e., call answer) if the form submits successfully, passing all validation.

By default, Weblocks does very limited validation of form submissions. For example, it can validate whether there are any missing slots that are required. But in this case, we need to make sure that (for example,) the password and the verify-password slot values match exactly. This validation takes place when the form is submitted, and Weblocks calls the function dataform-submit-action. If you haven't guessed, we need to override this function and add our own validation:

Quite simple. The function update-object-from-request updates the data model (i.e., the create-login instance) and returns t when everything succeeded, or (nil failed-slots) if something failed. For some reason, I ignored the fail case. Go figure. The check-login-and-flash-messages function then does the actual validation, adds a bunch of messages to the flash object (referenced via (login-message obj)), and returns t if everything was ok, nil otherwise.

If this function returns t, then Weblocks considers that the submission has succeeded and calls the on-success function, which we neatly set up to return from the continuation.

In real life, you would obviously add the actual user to some database, but that is essentially the meat of what I did. In the end, you get something like the following:

Tax-free of course. Otherwise the title would be: If I had 12 dollars :-)

I would spend some non-trivial amount funding the development of Weblocks and then write some kick-ass apps with it.

In my very humble opinion, I think that Weblocks is the way web apps should be written. It enforces modern web design practices in that there is not much futzing with HTML (unless you are adding UI widgets) and CSS is where you do your layout. Think about it. This way, your apps are "skinnable" from the get-go.

I've had my frustrations with it of course, due to it being a very young framework. But wow, it is pretty damn good given the scarce resources that have been responsible for it's development. Once Slava gets the object store integrated into it, watch out.

By comparison, I've been looking at Django and it makes me cringe, even though it is probably one of the best frameworks out there.

Anyway, I'd take half of the rest of the money over to my friend Asadullah in securities... (If you don't get the reference, go watch Office Space!)

Monday, 3 December 2007

That is the question... I recently came across the post installable software on the 37signals blog which has got me thinking. If I were using some mission-critical application, would I trust my jack-of-all-trades IT to handle it, or someone who really understands the system? When the question is posed in that manner, obviously the latter.

So as far as I am concerned, it comes down to one thing: performance. Are you selling a web app? If so, host it and you are done. I think there is no intelligence involved in that decision.

But what if you are selling something in which a web interface is only one part? A silly example: game servers. With most multi-player online games, there is a server gaming component as well as a web component for administration. By the way, people do make some money running tournaments so it isn't a bunch of kids wasting their time.

So the question is, would I pay someone to host my game server that I charge for over the Internet? If the performance was sufficient, and it usually is, I think the answer is yes.

I don't think there are many arguments against hosting or perhaps I have blinded myself to them. The comforting feeling I get knowing that I can control what people see and their upgrade experiences makes me feel warm and fuzzy. Still, I can't help but think that someone needs to blog about "Installable Software" and why it is better than "Hosted Software".

It is interesting to note that Mr. Software Blog himself, Joel Spolsky, has started a hosted service for FogBugz.

I just snagged a new set of Altec Lansing speakers for my computer. The bass in these bad boys is awesome.

What was interesting in this purchase was that the speakers were not set up so you could listen to them in the store. Apparently the other manufacturers pay the store to get them set up so you can listen to them. A couple of unauthorized rejigging of wires and I was previewing the sound.

Recently, the Linux kernel went from using a proprietary source control system to an open source so-called distributed source control system. Of course, the initial system was written by none other than Linus Torvalds. The software is known as Git. I don't have much direct experience with distributed source control myself, but I am watching a few people use it regularly. I think Git definitely fills a huge niche for the open source model. It encourages forks and merging of the best forks. It is a natural evolution in open source development.

But a lot of us don't have a need to create a fork of any random software package. Most of the time, we work in project teams that are hand-picked, not random. In this case, we mostly need centralized source control and more and more, we use Subversion (or Perforce, if you're into that sort of thing.)

Increasingly, I am finding that I really need to check in my changes but I am nowhere near the Internet (really!) or my server. I am searching for a solution to the problem. Many people claim that I really want distributed source control. No, I don't think I do!

What I do want is disconnected operation. An "offline-mode", if you will. This is how I would do it, say for Subversion. From the user's perspective:

$ svn up $DIR --offline

This command would create a local repository that I could check into while I was "disconnected", rooted at $DIR. The first revision would correspond to the latest revision that I had of each file. Then checkins would create deltas from this revision, but they would be local.

Once I am connected, I would want to type:

$ svn resync $DIR

To send all my changes back to the centralized server.

What it seems I really want is SVK. I've heard some good things about it. I suppose I will check it out (no pun intended!)

Sunday, 2 December 2007

Portable, efficient, lazily initialized thread-safe singletons are something that are needed fairly often in the C++ wild. I don't intend to cover why you are insane for wanting this. I intend to cover a naive solution that should work, but doesn't, and another solution that should work, given my information. I believe this solution addresses three of the four characteristics, namely efficiency is dropped. I prefer correctness and programmer time is important (and good programmers are expensive!)

To cover why you are insane for wanting these characteristics in the first place, I refer you to this and this.

Now, you may ask what makes me so special as to pretend to be an authority on this. Let me clarify: I don't pretend to be an authority. I am not. I have walked through this minefield a little bit and I will mark the mines for you as best I can. But if I may boast a little bit: I did find a bug in Open Solaris's pthread_once implementation (on x86) and another thread-safety issue in Boost Serialization. The reason I point these bugs out is not that the authors were deficient or incompetent. They are quite the opposite. I point these out because I feel that people don't realize what things are waiting to bite them in the butt when it comes to this area of software development. Indeed, it is easy to point the finger "haha, you made a bug," but we don't like those people around here. I have a shotgun and some bullets reserved for you if you insist on staying. Anyway, if our experts can make this mistake, you have made it and you don't even know about it.

If by now you haven't read through the above linked pages, I suggest you do so now. I don't have the writing capacity to not blather on like an idiot and repeat them to you. Still here? Go! Use the tabbed feature of your browser. If your browser doesn't have tabs, I have another box of bullets for you!

Done? Good.

So, quite often, we think we want a single instance of some object but we don't want the object to be constructed on program startup. The simplest way to do ensure this is the following:

MyObject & seductive(){static MyObject t;return t;}

That is a very seductive pattern, and if you are not concerned about thread-safety (i.e., you don't have multiple threads), then you are done. You may go Google Britney Spears. This is also known as Myers's singleton. Don't know which Myers. Don't actually care :-)

Now, what happens when multiple threads enter seductive() at the same time? You guessed it, you can get multiple initializations and all sorts of bad stuff happening. If you didn't guess that, then you definitely didn't read this.

All right. What the fudge? What are we supposed to do?!! I say: drop optimal efficiency as a requirement. You are probably copying a huge vector somewhere anyway before writing it to a file. That opens up many new worlds for solving this problem. The most straightforward, using Boost Threads functionality (it won't compile, I'm sure):

Even if it does compile, it won't work. "Hang on," you say. "That will work just fine. You're crazy. I'm going to go Google Britney Spears." You are wrong. That will not work. It will only appear to work kind of most of the time. Why? And the reason is:

Boost Mutex is not statically initializable

What the heck does that mean? The details are a bit fuzzy in my head, but the bottom line is that the C++ standard does not guarantee that the mtx variable is initialized before main(). Essentially, if a datatype has a constructor, you are SOL. ES. OH. EL. I don't know why the heck they didn't make the mutex statically initializable, but I guess it starts with "Win" and ends with "dows". I know pthread mutexes are statically initializable by design.

So what the fudge? We are still screwed. Yep. Pretty much.

So let us recap our problems:

If we use a mutex, it must be statically initializable

Double checked locking is broken except when the moon is blue and you are standing on your tippy toes

Popcorn gives me gas

If we decide to use Boost Threads (a fine choice, but sometimes limited), it has the first problem and there is no general, portable solution to the second. The third involves less popcorn.

Ah, but there is a silver lining. A couple of years back, I was at SD West when Scott Myers was giving his "Double checked locking (DCL) is broken" talk. Most of the crowd had no flipping idea what DCL was (I sure didn't) but Myers has this ability to communicate that I envy. So by the end of the talk, everyone knew what he was talking about and people were discussing ways to make it work. David Abrahams, of Boost fame, said: "Why don't you just use pthread_once?" And I thought: "Duh!" So the idea is not mine, but the implementation is! I present to you "Captain Sabraham's (or Dohail's) Singleton" :-)

Now we did drop efficiency, and that makes you very very sad, Mr Premature Optimizer. There is good news and bad news. The bad news is that the Boost Thread implementation of boost::call_once is slow on Windows. The good news is that it is fast on everything else.

Oh by the way, I lied. There is no part 2.

Disclaimer: This code may not work at all. You are free to not use it.

Friday, 30 November 2007

Emacs, as we all know, is quite flexible. One of it's most useful features is that it allows you to define indentation styles in a portable, text format. I think it is useful to set up per-project coding styles, if you work on various teams, or even just to document the style. For example, I have the following in my .emacs file:

This is pretty much the stroustrup coding style, taken from cc-styles.el except for the comment about the "odd one". Here, I am (for some reason), deciding that I want to indent my comments above the line they reference. So for example, instead of:

// Hello, how are youhello.how.are.you();

I want:

// Hello, how are youhello.how.are.you();

If Emacs can support that weird style, it can definitely support your style! Do yourself a favour and document it using Emacs so your code looks as consistent as possible.

Wednesday, 28 November 2007

Software testing is one of those things that you can't really nail down. Thanks to a lot of advertising and ease-of-use due to the *Unit libraries, unit testing is quite popular nowadays. But really, unit testing is just the tip of the iceberg.

A unit test tests the tiniest part of a program that can be tested. This can be a free function, a class function or sometimes even a whole class. But thats it. However, most people are really writing component tests. A component is the combination of multiple units to give some larger functionality. A component test therefore tests this combination. Then there are integration tests, then regression tests, etc, etc. There is a whole spectrum of testing. You can find more information in Code Craft.

In a lot of projects, there is a huge gap between developer testing and QA testing. Usually the only type of testing that takes place including both groups is unit testing and usability testing, if you are lucky! And in a lot of these, neither the unit tests nor the tests QA run are automated; they usually require some manual intervention.

With a little bit of work, you can really boost the way you test your C++ code. There are a couple of things you need to do first:

Automated build

Automated deployment

That is, you need to be able to push a button to build the distributable from scratch and push another button to deploy the application. And yes, as you guessed, you need to wire one button to the other. If you used batch files to accomplish the above, do it again using either SCons, Ant, Final Builder or the bazillion tools created especially for automating builds. I shouldn't have to tell you this, but you would be surprised!

Anyway, the next step is to choose a tool for writing tests in C++. I have used Boost.Test with great success. For whatever this is worth, I highly recommend this library. You want to write your unit and component tests in this library, for obvious reasons.

Automate the running of these tests and add them to the automated build.

If you have been keeping up, and since you are a C++ guru, you are also proficient in a language like Python. Now pay attention: you will be writing at least half of your tests in Python! Really. Just because all your tests are either unit tests or component tests, does not mean that you don't need to be doing this! In my opinion, this is where most teams miss out. They write their tests in CPPUnit or Boost.Test and then it is off to QA. And the main reason is that C++ is not a glue language. It is an application/library development language. Python (or other languages with a REPL) shine at gluing components together. You can effortlessly write a whole bunch of portable tests by choosing to go this way. The Boost.Python library with some code generation will get you started with a minimum of fuss.

Whether you are shipping a library or an application, the addition of a Python API can be very useful: for a library, it is another checkbox in the "we support these languages" box. For an application it is a bullet point: "Fully portable scripting." Not to mention the benefits you'll reap.

The addition of this simple functionality means that now there is now no excuse not to automate all your non-UI tests. And if you are on Windows, with a choice few Windows API functions, you can do your UI testing in the same language. Quite handy.

Chances are, now you are going to have a bunch of Python scripts lying around which still need to be invoked manually. To close the loop, I would recommend QMTest to manage your tests and their dependencies.

Now don't lie, you aren't testing as much as you can in C++. You need Python for that :-)

Update: I forgot one very important aspect of this whole thing: continuous integration. I've used BuildBot with great success and would recommend it's use. One little thing you might want to add onto the bot is the ability to create builds for each platform, for each branch, automatically. I've done this using Subversion triggers and it is beautiful to watch in action.

Tuesday, 27 November 2007

I am pretty clueless in general. But to my credit, I relentlessly pursue knowledge, some of it being useful! As you may know, I have been looking at Weblocks as I feel I would eventually reinvent it badly.

Anyway, one of the things that the author recently added was support for continuations. A very simple example was demonstrated, and lo and behold it worked! So I went off, confident that I understood what a continuation was. I created an app skeleton and started using with-flow. And it didn't work. I fixed some parentheses. It still didn't work. "Hmm," I thought, "perhaps continuations aren't just things to gloss over." So I went back and read Continuations-Based Web Applications in CL with Weblocks again. Still didn't get it. I finally ended up at Wikipedia's continuation entry. And it all made sense. I even understood what a delimited continuation was.

I went back to my code, moved a couple of more parentheses, added a couple of functions, and lo and behold, it worked!

Thank you Slava Akhmechet for making this useful feature available. Note, neither he or I are claiming that Weblocks is the first system to use continuations.

The use of continuations for Web applications should be more popular. I would assume the main reason it is unheard of is because most languages don't support it. To Java's credit however, this article discusses using continuations. However, true to Java form, the third listing uses XML. Sigh...

Why would you use the standard algorithms? Glad you asked. In my opinion, the main reason is readability. The above mentioned algorithms were the ones that came to mind immediately because they are the ones I've seen used the most. The aha moment for me (and indeed, I believe anyone that I've mentored) is that you realize that the algorithms communicate intent and not instructions. As we all know, declarative languages are so much better than imperative languages (except no one uses them.) But especially when it comes to data structure manipulation, being able to read what is going on rather than how it is being accomplished is great for readability. My intent is not to convince you of this fact, but just know that this is the main benefit for me.

Unfortunately, there isn't as much use of these algorithms as there should be. On the bright side, through the Boost libraries, people are being exposed (albeit indirectly) to their use more and more. This gives birth to four types of programmers or the Four Stooges (purely affectionate term):

Eager beaver

Wise old fart

Scaredy-cat

Yes, I know there are three here :-)

The eager beaver is the (likely junior) programmer who must use a stdlib algorithm no matter how convoluted it appears. He has read The C++ Standard Library three or four times and can more or less flip to the exact page he needs. This is a very good programmer to have. Eventually he will gain experience when to use them, when not to use them, and how to really use std::remove_if, becoming quite pragmatic.

The wise old fart is the (likely senior) programmer who has been programming in C++ back when it was called CFront. He still thinks that the C++ standard library is unportable because of the time that HP changed that one thing with the IOStreams and his program crashed because of it (I made that up.) But, he is usually open to other people using it and doesn't chide them for doing so. He makes sure that whoever is using it, knows what he is doing though. This is a very good programmer to have. Eventually, he will start using the algorithms and because of his experience, will be very pragmatic.

The scaredy-cat is very difficult to root out. Usually, he appears to be disguised as the wise old fart: experienced, appears to be deliberate, questions what the other programmers are doing, among other generally paranoid activities. The dead giveaway though is that there is not one use of the C++ standard algorithms or data structures in his code. He will stick to arrays and for loops. This programmer is good to have because of his experience, but he writes potentially buggy code. An example of potentially buggy code is code that were you to resize a container to contain one less element, that would cause a buffer overrun later. No, you shouldn't "have known that would happen." That is why containers that know their size exist. I don't know how you deal with this guy. If he is very stubborn, it is a tough situation.

The first two stooges can eventually become the fourth stooge: the pragmatic programmer. This programmer does not magically appear but always an evolution of the first two stooges. He is the super stooge. He understands that sometimes you just need a for loop.

Saturday, 24 November 2007

So last time, I mentioned that my defpage macro does not handle POST arguments. To recap, when writing the backend for a website, I'd like to treat my page definition as just a function call (sound familiar?) That is, if I was writing a page that returned the value of the submitted argument and added one to it, I'd really love to just write:

(defpage add-1 (a) #p"tmpl/add-one.tmpl" (list :a-plus-1 (1+ a)))

Well now that dream is a reality. If you have been following along at home (I'm pretty sure I'm talking to myself just now), what we need to do is modify the defpage macro to iterate through each argument, look up the parameters using hunchentoot:get-parameter, setq them to something sensible, and call the body of our page.

Lets look at a macro expand of our macro now (yeah, thats right, I took a screenshot):

You can see in the above code that the macro generates some very bad code! But, it gets the job done for now. Here is the finished macro:

You must be saying: "You can't be finished Sohail, you don't handle non-string types in the arguments yet! Not to mention optional arguments! You are lazy!" You hurt my feelings by thinking that. But maybe, JUST MAYBE, I SHOULD JUST USE define-easy-handler. Believe me, I swore pretty loudly when I realized that existed. Oh well, atleast I learned something useful.

Friday, 23 November 2007

You know the mantra: "Only throw an exception in exceptional circumstances." A close brother of this mantra is: "Exceptions as flow-control are bad."

I submit to you, the single reader of this article: Exceptions as implemented by most languages are the solution to the problem: "Something bad happened! OMG! <commence-running-around-like-headless-chicken>." The problem should really be stated as: "Something bad happened! Is there anyone who can tell me what to do? If not, I'll just launch the debugger and wait for someone." It is a subtle but important difference.

Consider enabling some code to exit early because you want to have a quick test of the current parameters for validation. In C++, you might write:

Oops! Except you can't do that can you? In a code review, inevitably someone is going to say: "HEY MAN NO EXCEPTIONS AS FLOW CONTROL. IT SAYS RIGHT HERE IN THE CODING STANDARD 1.9! (I'm so telling the boss)" And they are absolutely right. Well, except the snitching. We don't like those people. They aren't TEAM PLAYERS.

As you are new to the team, you go back and rewrite your code so you are no longer throwing the exception:

Ugh. Mr. (or even Dr.) Senior Developer says: "Hey, thats the way this is done in C++. Whaddayagonnado?" and gives your code approval.

Mr. Senior Developer is right. But for the wrong reason. C++ compilers have been heavily, heavily optimized to have no overhead in non-exceptional circumstances. See this Going Deep episode on C++ exceptions for an understanding of how that is accomplished. Interesting stuff. But damnit, what about non-exceptional circumstances? And damnit, what about when I may not decide to take any action and want the code to keep going when this condition occurs?! So as you can see, he is right only because there is a big gaping deficiency in the C++ language and trying to patch the hole either gives you a performance issue or it makes the code ugly.

So hopefully you see now why the problem was restated. But that form still doesn't apply here. What we really want to say is: "Hey, I'm in some state, does anyone care? I'm just going to keep going otherwise." This is exactly what Lisp provides with the condition system.

Thursday, 22 November 2007

From what I can tell, JeOS (Just enough OS) is a stripped down version of Ubuntu which can be of use to ISVs looking to create virtual appliances. It has been optimized for VMware. I have no idea what that means.

I think virtual appliances are a great idea. There is no configuration anyone needs to do to get started and more importantly only one install configuration to support. I do wonder how the underlying OS stays up-to-date. With Ubuntu, its usually apt-get update && apt-get upgrade. I suppose one can schedule that one if all else fails.

So where I left off last time, Bopinder and I had just created a sample page where (as a web designer) he was able to work on the application's front end without futzing around in Lisp. However, to get him there, I had to write a bunch of messy of code. See the end of the above linked article for the gory details.

In this article, there was the ever-present "business logic" intertwined with the code to generate the dynamic page (calls to html-template). Yuck. I would have to do this for every page! No good. So after writing the second page in this manner, I decided it was time to write a macro. Ideally, I would like to write:

That is, my page definition would amount to returning some named values. A formidable task! But a good separation in my opinion as then I could easily turn the results into a JSON (or XMSmell) service without blinking too much.

Since I have not yet written the macro to grab the values of the HTTP POST parameters that are posted to the page, lets pretend that functionality works as I expect, that is:

http://mywebsite.com/add-one?a=1

Would eventually call the function add-one where a is 1. Then the code inside the add-one function just has to concern itself with doing the actual calculations, not yucky HTML. I do have a nagging feeling that I will eventually reinvent WebBlocks badly, but that is how I learn!

The most important thing is that I want to be able to give hunchentoot a function called add-one which pulls out the required POST parameters, executes the body which returns our list with named values and passes this list to whichever template I specify. So here is the macro which now allows me to separate content generation from presentation generation (yay):