a lesser wizard's notebook

Menu

For some time now I’ve been meaning to write about why libraries (and librarians!) are important. After all, I’m a child of the library. In particular, I’m a child of thislibrary, where I spent many happy hours growing up. As the son of a blue-collar family that was anything but “bookish”, well, I don’t know where I’d be without libraries.

What follows are a collection of random thoughts (linky-blurbs, really) about the general amazingness of libraries and librarians:

Librarians fought, and are still fighting, surveillance-state idiocy like the PATRIOT Act.

On a lighter note, the Internet Archive also has this collection of classic MS-DOS games that you can play in your browser. When I saw titles like Street Fighter, Sim City, Donkey Kong, and Castle Wolfenstein in this list, I have to admit I kinda freaked out. The future is amazing. Growing up, I had to memorize my friends’ phone numbers! And now we have magic tiny emulated computers in our browser tabs.

There are many text markup languages that purport to allow you to write in a simple markup format and publish to the web. Markdown has arguably emerged as the “king” of these formats. I quite like it myself when it’s used for writing short documents with relatively simple formatting needs. However, it falls a bit short when you start to do more elaborate work. This is especially the case when you are trying to do any kind of “serious” technical authoring.

I know that “Markdown” has been used to write technical books. Game Programming Patterns is one excellent example; you can read more about the author’s use of Markdown here, and the script he uses to extend Markdown to meet his needs is here. (I recommend reading all of his essays about how he wrote the book, by the way. They’re truly inspiring.). Based on that author’s experience (and some of my own), I know that Markdown can absolutely be used as a base upon which to build ebooks, websites, wikis, and more. However, this is exactly why I used the term “Markdown” in quotes at the beginning of this paragraph. By the time you’ve extended Markdown to cover your more featureful technical authoring use cases, it really isn’t “just” Markdown anymore. This is fine if you just want to get something done quickly that meets your own needs, but it’s not ideal if you want to work with a meaningful system can be standardized and built on.

Below I’ll address just a few of the needs of “industrial” technical writing (the kind that I do, ostensibly) where Markdown falls a little short. Lest this come off as too negative, it’s worth stating for the record that a homegrown combination of Markdown and a few scripts in a git repo with a Makefile is still an absolute paradise compared to almost all of the clunky proprietary tooling that is marketed and sold for the purposes of “mainstream” technical writing. I have turned to such a homebrewed setup myself in times of need. I’ve even written about how awesome writing in Markdown can be. However, this essay is an attempt to capture my thoughts on Markdown’s shortcomings. Like any good internet crank, I reserve the right to pull a Nickieben Bourbaki at a later date.

I. No native table support

If you are doing any kind of large-scale tech docs, you need tables. Although constraints are always good, and a simple list can probably replace 80% of your table usage if you’re disciplined, there are times when you really just need a big honkin’ table. And as much as I’m used to editing raw XML and HTML directly in Emacs using its excellent tooling to completely sidestep the unwanted “upgrade” to the Confluence editor at $WORK, most writers probably don’t want to be authoring tables directly in HTML (which is the “native” Markdown solution).

II. No native table of contents support

Yes, I can write a script myself to do this. I can also use one of the dozens of such scripts written by others. However, I’d rather have something built in, and consider it a weakness of the format.

III. Forcing the user to fall back to inline HTML is not really OK

Like tables, there are a number of other formatting and layout use cases that Markdown can’t handle natively. As with tables, you must resort to just slapping in some raw HTML. Two reasons why this isn’t so amazing are:

It’s hard for an editor to support well, since editing “regular” text markup and tag-based markup languages are quite different beasts

It punts complexity to thousands of users in in order to preserve implementation simplicity for a small number of implementors

I can sympathize with the reasoning behind this design decision, since I am usually the guy making his own little hacks that meet simple use cases, but again: not really OK for serious work.

IV. Too many different ways to express the same formatting

This has lead to a number of incompatibilities among the different “Markdown” renderers out there. Just a few of the areas where ambiguity exists are: headers, lists, code sections, and links. For an introduction to Markdown’s flexible semantics, see the original syntax docs. Then, for a more elaborate description of the inconsistencies and challenges of rendering Markdown properly, see Why is a spec needed?, written by the CommonMark folks.

V. Too many incompatible flavors

There are too many incompatible flavors of Markdown that each render a document slightly differently. For a good description of the ways different Markdown implementations diverge, see the Babelmark 2 FAQ.

The “incompatible flavors” issue will hopefully be addressed with the advent of the CommonMark Standard, but if you read the spec it doesn’t address points I, II, or III at all. This makes sense from the perspective of the author of a standards document: a spec isn’t very useful unless you can achieve consensus and adoption among all the slightly different implementations out there right now, and Markdown as commonly understaood doesn’t try to support those cases anyway.

VI. No native means of validation

There will of course be a reference implementation and tests for CommonMark, which will ensure that the content is valid Markdown, but for large-scale documentation deployments, you really need the ability to validate that the documentation sets you’re publishing have certain properties. These properties might include, but aren’t limited to:

“Do all of the links have valid targets?”

“Is every page reachable from some other page?”

Markdown doesn’t care about this. And to be fair it never said it would! You are of course free to use other tools to perform all of the validations you care about on the resulting HTML output. This isn’t necessarily so bad (in fact it’s not as bad as points I and II in my opinion, since those actually affect you while you’re authoring), but it’s an issue to be aware of.

This is one area where XML has some neat tooling and properties. Although I suppose you could do something workable with a strict subset of HTML. You could also use pandoc to generate XML, which you then validate according to your needs.

Conclusion

Markdown solves its original use case well, while punting on many others in classic Worse is Better fashion. To be fair to Markdown, it was never purported to be anything other than a simple set of formatting conventions for web writing. And it’s worth saying once more that, even given its limitations, a homegrown combination of Markdown and a few scripts in a git repo with a Makefile is still an absolute paradise compared to almost all of the clunky proprietary tooling that is marketed and sold for the purposes of “mainstream” technical writing.

Even so, I hope I’ve presented an argument for why Markdown is not ideal for large scale technical documentation work.

This post details how to go about installing perlbrew on a Fedora 20 Linux system for a user whose default shell is in the ksh family, specifically mksh, which is the shell I use.

The instructions in each section can be used separately if you are just a Fedora user or just a (m)ksh user.

Installing Perlbrew on Fedora 20

If you try to install Perlbrew using the instructions on its website, Fedora barfs. Fortunately, the workaround you need is detailed in this Github issue. (Hint: read the whole thread and then do what wumpus says.)

Using Perlbrew with ksh or mksh

The perlbrew bashrc file uses a few “bashisms” (unnecessarily, I think, but it is a “bash” rc after all). I managed to hack them out and, instead of sourcing perlbrew’s default bashrc file, I now source a kshrc built by applying the patch below.

It’s working well for me so far under mksh, but feel free to leave a comment if it doesn’t work for you and I’ll try to help.

Inspired by words from Ryan Tomayko and Jacob Kaplan-Moss, I set out to see how difficult it would be to implement a tiny UNIX echo server in scsh. It turns out to be fairly easy. In this post, we’ll cover:

How the UNIX socket programming model works at a high level

How to write a basic echo server in scsh

Unlike the echo servers of Tomayko and Kaplan-Moss, this one doesn’t use fork (yet). We will add forking to the server in a follow-up post. Hopefully this will make it easier for less experienced UNIX programmers (like me) to get started with.

The UNIX socket programming model for dummies

At a high level, the steps to open a server-side socket are:

create a socket “thing”

toggle some settings on it so the OS knows it’s for interweb use

bind it to an address

listen for connections on it

start accepting those connections

scsh does its part to make this easy by providing a high-level Scheme procedure, BIND-LISTEN-ACCEPT-LOOP, which handles this stuff for you, in addition to some nifty error handling and other bookkeeping chores. But we’re going to ignore that for now and write everything by hand to see how it’s done.

You should be able to enter all of the code in the next section directly in the scsh top-level. You don’t need to open any additional packages; this is just plain old scsh.

Our echo server

Our server takes two arguments: the PROC is a procedure that causes the server to actually do something; we’ll write in a minute. The PORT is the port you want to serve on, e.g., 49995:

As you can see, our ECHO procedure takes a socket and an address. (We don’t do anything with the address in this example, but our procedure needs to “implement this interface”, as they say, in order to work. You can see this for yourself in the scsh 0.6.7 tarball; it’s in scsh/network.scm.)

In our LET*-binding we create some convenient locally bound names for the socket structure’s input and output ports, and then we read a line from the socket’s input port.

Since this is echo server, we just write the same data back out with FORMAT. We call FORCE-OUTPUT to flush output to the terminal immediately. Otherwise Scheme will wait for the operating system’s output buffer to fill before writing out, and it will appear to the user that nothing is happening.

Trying it out

Let’s start it up and see if it works. Assuming you’ve loaded the procedures above in your scsh image somehow, enter this at the REPL:

> (server echo 49995)

The REPL should appear to “hang” while the server is running. Now go to your terminal and connect to the echo server. There are several ways to do it; here’s what I used:

Hopefully that was enough to get you started playing with sockets in scsh. I’ll write a followup post where we talk about the UNIX forking process model and update this example to make it a “preforking” server.

Most Scheme 48 and scsh programmers are familiar with the \,dis command. It allows you to disassemble Scheme code (a procedure or continuation) into the “assembly” understood by the Scheme 48 virtual machine. Here’s an example using scsh’s fork/pipe procedure:

This is pretty cool. However, I thought it would be even better if Geiser could run the disassembler on a region of Scheme code, and pop open the disassembler output in a new buffer. Geiser already supports similar functionality for macro-expansion via GEISER-EXPAND-REGION. There’s no reason why it shouldn’t do the same for the disassembler. So that’s what I taught it to do – in this screenshot I’m disassembling the LETREC expression:

(I have to admit I was inspired to work on this by my jealousy of the disassembler functionality available to Common Lispers via the SLIME-DISASSEMBLE-{DEFINITION,SYMBOL} commands in SLIME.)

How it Works

Here’s how it works:

Highlight a region of Scheme (scsh) code in a buffer

Call ‘M-x geiser-scsh-disassemble-region‘ (a.k.a., ‘C-c RET C-d‘).

Emacs sends the Scheme code to the running scsh process for disassembly via a newly exported Geiser macro, GE:DISASSEMBLE. This macro wraps the code in a LAMBDA before sending it to the disassembler to work around the fact that the S48 DISASSEMBLE procedure won’t accept arbitrary Scheme expressions such as (+ 1 1). (Wrapping expressions in LAMBDA like this is not ideal for reasons I’ll explain below.)

Future Work

This implementation is a proof of concept – don’t believe in it. Think of it as a prototype of how this functionality could work if built into Geiser proper. Here are some of the issues with it from an implementation perspective:

It doesn’t use the correct Geiser protocols for sending Scheme code to evaluate, instead just piping in raw strings. This was expedient because of the way s48/scsh use non-reader-friendly strings like {#Uncategorized} as the final lines of output for procedures whose outputs are not defined by the standard. I think this can be fixed by coming up with a better implementation of the Geiser code evaluation protocols (in scsh/geiser/evaluation.scm) so that they handle all of the weird cases in S48 output.

Related to the previous point, I’m doing some ugly regex stuff on the stringified output of the disassembler to make it nicer before piping it into the temp buffer.

This functionality should really be added to Geiser itself, via the GEISER-DEBUG-* namespace. Then it would be forced to address both of the above points. Right now it’s just an ugly little hack in geiser-scsh.el. In principle, with the right infrastructure in GEISER-DEBUG-*, there’s nothing preventing a Guile or Racket implementation (here’s the Guile disassembler in action – you can see that it’s not so different from S48):

The disassembler, like the macro-expansion functionality, should be able to work recursively. That is, in places where the S48 assembly makes procedure calls like (global ge:really-fork/pipe) (as it does in our first example way at the top of this post), you should be able to ask for the disassembled output of GE:REALLY-FORK/PIPE as well, all the way down to the core primitives. Currently, there are cases where you still need to call \,dis <PROCEDURE> at the command prompt. I think this limitation is created by the way we wrap the expression in a LAMBDA before sending it to the disassembler. A better design is needed, one that (for example) calls PROCEDURE? on the proposed input, and then decides whether the expression needs to be wrapped in a LAMBDA before going to the disassembler.

A while back, Ben Horowitz came to give a talk at the company where I work, AppNexus. During that talk, I discovered that he is an investor. After Mr. Horowitz’s visit, the company generously bought every employee a copy of his book The Hard Thing about Hard Things.

It took a while to get around to it, but I’ve finally given it a read on my long bus rides. My impressions are as follows:

It’s bullshit free. There is no management-speak, jargon, or other obfuscatory nonsense anywhere in this book. It is very information-dense and doesn’t waste your time.

It’s honest. Mr. Horowitz describes how gut-wrenchingly hard it is to be the CEO, especially when things aren’t going well. His experience is particularly informative because he’s been a CEO during many times when things weren’t going well. He relates his wife’s observation regarding how his service as CEO was met by some: “They called you everything but a child of God”.

It tells good stories. The book illustrates its salient points with stories taken from real experience. Most management books are written by consultants (as Mr. Horowitz mentions), this one is written by someone who’s been there.

It uses metaphor. The two that stick in my mind are (1) the difference between “Wartime CEO” and “Peacetime CEO”, and (2) something called “The Torture” (read the book to find out more).

It’s concrete. The author has real opinions, and he expresses them. For example, he makes it clear that Andy Grove’s High Output Management is one of his favorite business books. He also describes in frank terms how Ernst & Young almost tanked the sale of his company (when he was a client of theirs!), and thus are not his fave, to say the least. It’s hard to trust people who don’t express concrete opinions; Mr. Horowitz is not in that group.

That’s all very nice, but what does any of this high-falutin’ CEO talk have to do with my life as a technical writer? After all, it’s not always clear how books aimed at executives are of use to an individual contributor.

Even though this book is for and about executives, it has given me some insight into how an executive might be thinking about different functions in the company and their value. It seems that technical writing’s value to the company from this perspective is as follows:

Generate a positive impression of the company: For example, I’ve had someone who works at AppNexus tell me that, when he was working at another company, he and his team really valued our wiki’s API docs (I’d link but they’re behind a login). Good API docs made us the easiest platform for his team to build against. Because we were the easiest to build against, we were the first company they would integrate with. Not only is it nice to hear things like this, it really makes the value to customers clear. I would imagine that this is helpful for sales as well.

Reduce support costs: This is the flip-side of the previous comment. If customers can get things done faster by using information in the docs than they can by sending a support request, you win.

Stave off imformational collapse: This sounds negative, but it must be accounted for. As a company grows, you can no longer scale by bringing new people up to speed with a “come sit next to me and I’ll show you” workflow. This also applies to existing employees who are working in an area of the product/platform that’s new to them. Someone has to do the nasty work of writing stuff down in order to scale your internal technical communication. Often engineers will do this work, but sometimes you need technical writers. This is why Google is hiring for Technical Writer, Internal Software Engineering positions right now (and has been for quite a while from what I’ve seen). At a certain size, word-of-mouth just doesn’t work that well anymore.

A final point worth noting is that the job of CEO is stressful as hell. After finishing this book, I can say that I’m much happier in my role than I would be as an executive. It doesn’t seem very glamorous at all based on Mr. Horowitz’s description. Although I’m sure the money is nice, it’s been shown over and over again that money is not much of a motivator past a certain point, nor does it buy happiness.

When writing in a source format such as Markdown, it’s nice to be able to see your changes show up automatically in the output. One of my favorite ways to work is to have Emacs and Firefox open side by side (as shown above). Whenever I save my Markdown file, I want Emacs to automatically build a new HTML file from it, and I want Firefox to automatically refresh to show the latest changes.

Once you have this set up, all you have to do is write and save, write and save.

As it happens, fellow Redditor goodevilgenius was looking to accomplish just this workflow. I originally posted this answer on Reddit, but I’m reposting it here in the hope that it will help some kindly internet stranger someday.