We've got SICPv2 starting at the Toronto Computer Science Reading Group. Or rather, it started last week. Anyway, when we did this the first time, a few people found out about it three-quarters of the way through, and expressed sentiments like "I wish I found out about this when you were starting out". It was enough people that they've managed to organize a second round. And, yes, if there are 9 or fewer core group members, we'll totally be handing these out:

Last time, I mentioned getting nix-the-package-manager up and running on my machine. And I mentioned setting up a Haskell environment with it. What I didn't mention is that some Haskell libraries are currently failing to install. As of this writing, that seems to include all of the Haskell web-frameworks other than scotty and snap. Yesod and happstack both error at compilation time with some odd type failures that I don't know enough about to diagnose. The specific problem I had this week involved that last one, which also happens to be the server used internally by gitit.

I just said "fuck it" and built my own. It's not generally the sort of thing I do, but judged that it would be a lot more fun and somewhat easier than installing Mediawiki and its markdown plugin. And I think I happened to be right in this case; the whole thing took about two hours or so, plus a half hour of cosmetic changes for very mild ease-of-use.

Module import boilerplate. Interestingly, though I put it up top out of habit, Python seems to allow you to keep your imports 'till the end so that they don't have to destroy reader flow. I've made a mental note to do something about that.

A page in the wiki is represented as a file on disk. A wiki is actually just a directory with a git repo for history support. There are two ways we might want to look at a single file; either as raw markdown when we're making edits, or as HTML when we're just reading. The view_raw_page function takes a relative path as well as a wiki directory, and loads the given file from that wiki. If the specified file exists somewhere outside of the given wiki directory, we raise a NotInRepo error instead of doing anything. This prevents requesters from getting arbitrary file-system access to our machine by passing .. as part of their request paths. If the given path would be inside of the given wiki, and merely doesn't exist, we instead raise a PageNotFound error. We'll exploit this for page creation code later.

Unlike the view_(raw_)?page functions above, delete_page makes changes to the underlying filesystem. Specifically, it deletes a file in the repo and additionally deletes its containing directory if it's empty after the initial deletion|1|. Just as in view_raw_page, we check that the page we've been given exists inside the given repo. As much as we don't want to let random HTTP requesters see arbitrary files on our system, letting them delete arbitrary files would probably be worse. If the page exists, we delete it, then run rmdir on its containing directory|2|, then commit the changes with a mildly descriptive message. If the path given to delete_page is actually a directory, we instead throw a IsADirectory error. Arguably, we should let users delete subdirectories and do the obvious thing as a result, but I can't see it coming up in the kind of uses I'm planning to put this to. Finally, if the specified page doesn't exist, we raise a PageNotFound error. Again, arguably, we could just silently eat this error, since the result is still "the specified page no longer exists", but I'm being explicit for the moment.

Creating a page follows the same principles as delete_page. First, we check that the specified path will fall inside of the target wiki. If the page already exists, we return the explicit PageExists error rather than silently ignoring the condition. Then, we make sure that the full directory tree leading up to our new file exists, create the file with a default title equal to its path, and finally commit the changes.

Having seen the previous three functions, it should be perfectly obvious how we go about editing an existing page. Sing along this time.

Check it's in the repo, one two

Apply the given changes, three four

Commit the file, five six,

Raise an error if it doesn't exist, seven eight

Now for the internals.

def initialize(repo="."):
call(["git", "init"], cwd=repo)

initialize is actually not called anywhere at the moment. We instead assume that the user has set up their own repo somewhere before telling wik to serve it. If we were automating that step, this is how we'd do it.

This is another function that isn't really being called yet. It will be at some point, but at the moment I'm not extending a reversion interface to HTTP clients, so we just have the definition.

def identity(a):
return a

Apparently Python doesn't have a built-in identity. Even though some built-in higher-order functions assume the identity function in certain argument slots. I guess "there should only be one way to do it" doesn't quite translate to "if many users want it, we should implement it once".

Almost done. is_in_repo is the function that takes a path and a repo and checks if the first is inside the second. It does this by checking that the given path both is_in the given repoand that it's not is_in that repos' .git subdirectory. is_in just takes two pathnames, canonicalizes them using os.path.realpath, and check if the first has the second as a prefix.

The last bit of wiki.py just defines the custom exceptions you've seen being thrown above. They don't do anything other than pass, because the only thing we really care about is that we can tell them apart form built in errors. We don't actually need to store any additional information for our purposes at this point, though I do reserve the right to changes that in the future.

On to main.py

import tornado.ioloop, tornado.web, json, os, sys, re
import wiki

Again, import boilerplate; forgiveness please. Though I guess that I should point out I'm building this mini wiki on top of the tornado asynchronous web server.

The ShowPage handler takes a path variable. If that path designates a directory, or the wiki root "", we instead list the given directory by calling the list_template. If that path designates an existing file, we show it by calling wiki.view_page, and writing the result into the view_template. Finally, if the path doesn't designate an existing file, we show the create_template. We'll see all of those templates shortly.

The EditPage handler takes a path, and just writes out the edit_template, filled with the result of a call to wiki.view_raw_page.

Those were the only two handlers that return actual HTML. The rest of them, as you're about to see, merely redirect the caller. Ideally, they'd only return some kind of JSON-encoded ack, but that would complicate writing a dumb interface. Maybe something for a future version.

Those three handlers do the appropriate thing for the wiki calls delete_page, create_page and edit_page respectively. The only one that's even mildly complicated is EditAPI, which potentially has to pass along a commit_message from the client as well as a path. Before we get to the cosmetics, lets skip ahead a bit and see where all these path parameters to our handlers are coming from.

As you can see, the URL dispatch table pairs a regex to a particular handler class. That group in each one is going to be passed as an argument to the appropriate method. Note that in this case, they all capture most of the incoming URI, but that's certainly not a requirement. You can capture path pieces exactly how you'd think. The only setting we're interested in setting is the static_path; and that should be the static directory relative to this file rather than relative to the directory in which wik will eventually be run.

Last couple of things. I'm keeping WIKI_ROOT as a global constant, because I'm working under the assumption that a particular instance of tornado will only serve one wiki. This may end up being a faulty assumption later on, in which case I'll need to re-think where and how the directory gets stored. As it stands, it'll be a single global, and as you can see from the __main__ block, we set it from the first and only command-line arg. At the moment, I'm not even parameterizing the port number, opting instead to use the literal 4848. That's a note to self; the right thing to do in this situation is would be importing and appropriately configuring/calling argparse so that we could pass in a target directory, as well as a port, and maybe some other configuration options. So, you know. Get on that, self.

The last bit we need to go over is the code defining our basic cosmetic templates. I'm fully aware of tornado-template, but didn't bother with it for stuff this minimal|4|.

The main_template contains the basic html/head/body tags, and expects to be passed some contents and a path. The contents are naively templated into a div#content tag, while the path is passed to breadcrumbs for processing.

I found it kind of odd that this was the most complicated single procedure in the entire application. Nope, not the exposing a named directory without allowing URL injection, not tracking edits or even figuring out the history of a particular file. It's that stupid little breadcrumb trail of links across the top of every page. So it goes sometimes. If the given path is the root, we just return home. No links or paths or any other kind of processing. Otherwise, we split the path on slashes and see what we get back. If the result is a list of 1 element, we return something like home/foo, where home is a link to the root and foo is the name of the single path element. We do basically the same thing with a path of len 2 that has the empty string in the first position. The reason both of these are conditions here is that I did some interpreter testing and found that certain versions of Python split a path like /blah into ["blah"], while others did ["", "blah"], and I wanted to cover at least all the options I've personally observed. Finally, if none of the above are the case, we return something like home/foo/bar/baz/mumble/file, and make sure that every path element except for the last one has the appropriate link attached.

The edit, create and view templates aren't interesting enough to dwell on. They each show some basic controls, and do the appropriate thing on submit. I should say, they're not interesting enough to dwell on yet. I'm still planning to drop codemirror into this project so that you can have pretty highlighting and a comfortable experience in the edit interface, but that's about it. From the create template, you can create a new page, and from the view template, you can either edit or delete the current page.

And it does exactly what you'd expect; returns a giant ul tag with links to each file and directory visible from the specified path into the wiki. This is another place I'm planning some improvements. Specifically, it would be nice if the entries were arranged alphabetically, with all directories coming before any files, and with appropriate file/directory icons marking them as appropriate. I'll let you know how it goes.

Oh, actually, I guess there were a few utility functions still left to go over, though they're all hopefully self-explanatory.

As a complete aside, writing wik was the first time I used entr seriously. Because editing the above, especially those templates, required a lot of server restarting, eventually I just started up a separate terminal running

ls *py static/css/*css | entr -r python main.py ~/wiki-data

which started up my server, and killed/restarted it each time I saved any .py or .css files I was working on. It's pretty useful having this sort of thing automated, though it doesn't quite do what I want for C development. Really, what I'd want there is something more like hsandbox, but running on a file I specified. That's something I may put some work into at some point soon.

Something I've been seriously meaning to get into is some basic math. It's surprising, and somewhat embarrassing, how long I've gone without doing that. So this past week, I finally registered an account over at Khan Academy and plowed through the Combinatorics/Probability lessons as well as I could. It still feels like I need to practice and study more, but I have a less shaky grasp of n-choose-k problems than I used to. I'm not prepared to swear by the information yet, given that I haven't battle-tested it at this point, but I can tentatively recommend the lessons|5|. They certainly help retention over the moderate term.

I was going to mention the recent Cabal memory-management-fest, in which the current core members got together to discuss the implementations they'd spent the week building. Mine's up here, while Scott's are over here|6|, and dann hasn't posted anything yet as far as I know. I was going to go over each of those, but this piece is already quite a bit longer than I was expecting. Fuck, also, I've been putting some work into exercises for Learn Lisp the Hard Way. At the moment, I'm just working on section 1-04, but I'm hoping to claw some time together over the next couple of weeks. It's an interesting effort, and I guess technically the second book I've contributed to. I can't wait to see what kind of impact it has.

Now that I've done an initial proof of this article, it occurs that I opened with "There's not much going on".

Given that the above just gives you some minor thumbnails, and doesn't include anything from my personal life, I have no idea why I did that.

1 - |back| - It does not, as of this writing, do that recursively, but probably should. Note to self.

2 - |back| - Ignoring the potential OSError thrown if the directory still has something in it.

3 - |back| - The --all is really only necessary for deletions, but it's easier to call it everywhere instead of dispatching, or exposing an extra flag argument to let the caller decide whether to add it.

4 - |back| - Also, I'm not sold on the idea of mixing HTML with random code in arbitrary languages. :cl-who and similar have taught me to expect somewhat more elegant generation machinery.

5 - |back| - Though I will say that I'm not sure I'd recommend any of the lessons that have anything to do with code. They all take an extremely imperative bent and pretty severely over-complicate some problems. The particular offender that sticks out to me is Merge sort, which I learned through the very simple functional approach, but they expect you to do in-place. Not that knowing that is bad, but it seems backwards to teach it first.