is a document generator in the sense that it generates a documention from given source code

is also a literal programming tool in the sense that it extracts source code scattered in a (Markdown) document and so turns it into an application

uses Markdown as the principal (intermediate) document format in these kind of conversions, but also as the very convenient and powerful syntax for source code documentations itself.

is very universal, as it works for all kinds of programming languages. In fact, it is not only a document generator, but a document generator generator for any new type of code.

is very universal also in the variety of available documentation formats.1

is very easy to learn for all these languages and formats, because it comprisis only two or three syntactical rules to manage all situations.

CodeDown is implemented in Haskell and codedown is the executable that can be called from the command line to perform any of these conversions. See the according appendices below for the according instructions how to install and use it.

CodeDown is deliberately not smart. Its own idea is extremely simple and only requires to understand two or three syntax rules for the core conversion. However, this simplicity is built on the inherent properties of the Markdown lightweight markup language, so you need to become familiar with Markdown and we therefore start with a short recap of its features.

CodeDown is designed, implemented, and presented here in two steps. First, Core CodeDown defines the back and forth conversions between the different types of codes and Markdown. The simple rules for these conversions are introduced here, for the concrete example of PHP code, and for all other types of code in general. In practice however, one usually wants to generate other document formats than Markdown, say HTML, and to integrate this in a comfortable codedown command, the universal document converter Pandoc2 is merged into CodeDown. Pandoc CodeDown thus explains the options.

Markdown was originally designed as a way to ease the generation and comprehension of HTML source code. But meanwhile, there are a couple of Markdown extensions and implementations (including Pandoc) that suggest Markdown as a default authoring format for documents in general.

For example, suppose we want to publish a HMTL file example.html with the following content:

<h1> Overview </h1>
<p>
Originally, <a href="http://daringfireball.net/projects/markdown"> Markdown </a>
introduced a simplified style for a couple of HTML inline and block elements, like the mentioned
ones. All other HTML features still remained present by writing out the HTML tag syntax.
</p>
<p>
Meanwhile, a couple of <a href="http://en.wikipedia.org/wiki/Markdown_extensions"> extensions </a>
have been defined and implemented that introduce lightweight versions for tables, definition lists,
footnotes etc and Markdown is evolving into a standard for writing documents in general.
</p>
<p>
There are many <a href="http://http://xbeta.org/wiki/show/Markdown"> implementations </a> by now. We use
<a href="http://johnmacfarlane.net/pandoc/"> Pandoc </a>, written by John MacFarlane, that not only
offers many Markdown extensions, but is a universal converter between all kinds of text and
documentation formats.
</p>

Instead of writing out this "tag soup", we could just create a file, say example.markdown, containing this:

# Overview
Originally, [Markdown][] introduced a simplified style for a couple of HTML inline and block
elements, like the mentioned ones. All other HTML features still remained present by writing out
the HTML tag syntax.
Meanwhile, a couple of [extensions][] have been defined and implemented that introduce
lightweight versions for tables, definition lists, footnotes etc. and Markdown is evolving into a
standard for writing documents in general.
There are many [implementations][] by now. We use [Pandoc][], written by John MacFarlane, that not
only offers many Markdown extensions, but is a universal converter between all kinds of text and
documentation formats.
[Markdown]: http://daringfireball.net/projects/markdown
[extensions]: http://en.wikipedia.org/wiki/Markdown_extensions
[implementations]: http://http://xbeta.org/wiki/show/Markdown
[Pandoc]: http://johnmacfarlane.net/pandoc/

and then generate example.html from example.markdown with the original Perl executable

Markdown is an excellent format for the documentation of programming source code! If you ever have to write a manual for some program or application, this is a very convenient format. It is very easy to read and write, especially the just mentioned syntax for inline code and code blocks is very efficient and intuitive. The huge amount of Markdown converter implementations, including some online tools, makes it ubiquitously available. And they not only convert to HTML, but to any documentation format you could possibly whish for: groff man pages, PDF, RTF, LaTeX, DocBook XML, you name it. Besides, it is even very readable in its own text style.

By the way, this very document CodeDownManual.html was originally written in Markdown and then converted to HTML. 5 The source text CodeDownManual.markdown should thus be a good example for the ease and beauty of the Markdown syntax (in the extended Pandoc version).

Obviously, converting a file from PHP to Markdown changes the content and character of the file. Let us first introduce the conversion for the concrete example of PHP code.

Recall, that PHP has two kinds of comments:

A line comment, where everything after a // symbol until the end of the line is considered a comment.6

A block comment, that includes everything between an opening /* and a closing */.

As usual, a comment is a part of the source code, that is ignored by code applications like interpreters or compilers.

By a variation of these comments, CodeDown destinguishes the following areas in the PHP source code:

A Markdown document line starts with a // // (i.e. 2 slash, 1 space, 2 slash, 1 space) at the beginning of the line. Everything that follows is preserved by the conversion to Markdown as part of a code block. More precisely, all parts of the form

// // ... some Markdown text ...

in the PHP source are converted to

... some Markdown text ...

in the Markdown target text, i.e. this comment will be preserved as it is.

A Markdown document block is everything between a /*** and a ***/, where both symbols have to be placed at the beginning of a line. In other words, all parts of the form

/***
... some Markdown text ...
... more Markdown text ...
***/

in the PHP source are preserved as such

... some Markdown text ...
... more Markdown text ...

in the Markdown target text.

A literal code block is everything between a line that starts with ///BEGIN/// and another line that starts with ///END///. Note, that each of these two delimiter lines makes a PHP line comment, so that all code lines inbetween is not comment, but PHP code that will be processed by PHP machines. In the conversion to Markdown, these code blocks are wrapped in Markdown code blocks (with 4 spaces before each line of code) and this is again placed into a quote (with a preceding > and another space). In other words, all parts of the form

We just described how CodeDown defines the Markdown document generator for the PHP programming language. But CodeDown is a Markdown document generator for just any (main stream) code language. In fact, the implementation is designed so that adding a new document generator for yet another type of code XYZ just requires a few lines of code. 7 In this sense, CodeDown is a true "document generator generator".

In the sequel, we explain the document generation with CodeDown for arbitrary types of code.

Once the general principles for the document generation is understood, it should be possible to work with codedown without the need to consult this manual anymore. All specific information is then available from the following calls:

displays the two or three syntax rules for the document generation of the given CODE, where CODE is one of the members from the previous list of codes. The CODE value is case insensitive, e.g. JavaScript, JAVASCRIPT, javascript etc. are considered the same.

We will soon explain how the answer is to be interpreted, but here is an example:

i.e. a help call with the symbols value finally provides a tabular overview of all implemented types of code and all the original and modified comment symbols involved. The response of this help call is shown and explained below.

codedown --help

finally is the general help call, that also lists the previous help options.

All mainstream programming languages allow the insertion of comments into the source code. These are text parts, that are ignored by machine (i.e. the interpreter or compiler). The syntax for comments always works according to at least one of the following two principles:

line comment

There is a special symbol (a single character or a certain short character string), after which the rest of the line is ignored. In C, JavaScript and Java this is the "//", in Scheme an Lisp this is ";".

block comment

That is a text part spanning over an arbitrary length, wrapped between a begin and end symbol. For example, in JavaScript, block comments are enclosed between "/*" and "*/". In SML, the delimiters are "(*" and "*)".

Again, every modern programming language provides at least one of the following kinds of comments. Some only have line comments, such as Scheme, bash scripts or Perl.8 Others only know block comments, such as SML and SQL. And languages like C and Haskell have both. 9

The universal CodeDown document generator modifies the comments of a given code language so that each source contains certain designated parts:

Markdown document parts

These are comments written in Markdown format, which are preserved during the conversion. Depending on the comments defined in the code language, these parts are Markdown document lines, in case the code language has line comments, or Markdown document blocks, in case block comments are defined.

In PHP, document lines are lines that start with a "// //" and document blocks are initiated with line "/***" and terminate with a line "***/".

Literal code blocks

These are comment delimiters around parts of code. In PHP, for example, a literal code block was opened with a "///BEGIN//" and closed with an "///END///". Literal code blocks are preserved as code blocks in the documentation.

All other source code outside Markdown document parts and literal code blocks is ignored during the document generation.

Note, that the special CodeDown symbols (e.g. "// //", "/***" and "***/", "///BEGIN///" and "///END///" in PHP) always have to be at the beginning of a line.

This is the __Hello world program__ in C.
Implementation of the `main` function:
> int main (void) {
> printf ("Hello world\n"); // prints a message
> return 0; // exit normally
> }
To compile the program and generate an executable `hello`, call
gcc -o hello HelloWorld.c
Subsequently, you apply it by calling
./hello
It will answer with
Hello world

Some of the document types are also code types, namely LaTeX, HTML and XML. But if they are considered as such, we attach a _code suffix to the name, i.e. the values are LATEX_CODE, HTML_CODE and XML_CODE.

The list of all possible code types is shown by calling one of the following two help commands

codedown --help=codes
codedown -h codes

The names of all these formats need to be specified in the source (--from or --read) and target (--to or --write) options of the codedown command. These name values are case-insensitive. For example, LaTeX, latex and LATEX are equally possible.

specifies the format of the input source. If this option is not specified, codedown first attempts to determine it from the extension of the (first) file specified by the --input (or -i) option. If this fails, too, then the FORMAT is set to markdown as the default input.

the format of the output target. If this option is not specified, codedown first attempts to determine it from the extension of the file specified by the --ouput (or -o) option. If this fails, too, then the FORMAT is set to markdown as the default output.

--input=FILE_1,...,FILE_N or -i FILE_1 ... FILE_N

specifies the input text. If the list of files is empty, the input is set to standard user input. If there is more the one file in the list, the contents of these files are concatenated.

Note, that this is the only CodeDown option that has no equivalent in Pandoc. There, the input files are listed as such, without a preceding --input or -i key. (There is however a Pandoc key -i, which is short for --incremental, and that makes list items in Slidy or S5 slideshows to be displayed incrementally. If you want to control this Pandoc option from a codedown call, you have to use the long --incremental.)

--output=FILE or -o FILE

defines the file for the output of the conversion. If FILE does not exist, yet, it will be created, otherwise it will be overwritten. If this option is not specified, all output goes to the standard output (except when the target format is set to odt, epub or pdf).

A special case is the possibility to set one format to code. If the source format is set to code (i.e. --from=code) and the target format is markdown, then the whole input is put into a single code block. This can be useful if you need to display an entire source file in standard CodeDown layout. Conversely, if --from=markdown and --to=code then the code blocks (lines preceded with > plus 5 spaces, at least) are extracted as code blocks, while everything else is ignored.

comment line denotes the native line comment symbol of the given language, if there is one.

comment begin and comment end contain the native block comment symbols of the given language.

doc line denote the CodeDown symbol for Markdown document lines. This is only defined, when the language has a comment line symbol "x", and in that case, the default rule for the CodeDown symbol is "x x" (i.e. x, space, x, space). Note, that this symbol has to be placed at the beginning of a line.

doc begin and doc end are the delimiters for Markdown document blocks. These are only defined, when the language has provides block comments. Note, that each of the two delimiters has to commence a line.

literal begin and literal end are the delimiters for literal code blocks. Again, each delimiter has to be at the beginning of a line.

Note, that

Perl, Ruby and Python do have block comments, but they are not used as plain comments but have their own markup syntax and function.

This is entirely thanks to John MacFarlanes Pandoc, that does all the hard work hidden behind the scenes. ↩

Compare to the simplicity of CodeDown, Pandoc is a huge and very sophisticated program written by John MacFarlane, which does all the heavy conversion work between the different document formats. ↩

The syntax of the codedown command is very similar to the pandoc command syntax. There is one big difference, however, namely the --input option, which does not exist for pandoc. There, the input files are added at the end of the call, as the example shows. ↩

There is yet another version for code blocks in Markdown, but only in the extended Markdown version of Pandoc, namely delimited code blocks between tilde-lines, with an option to use syntax highlighting for many types of code. You can use that, too, but the official version of CodeDown does not mention this explicitly. ↩

The conversion was done with the command codedown --from=markdown --to=html --input CodeDownManual.markdown --output=CodeDownManual.html --table-of-contents --standalone --css=CodeDown.css, and that has the same effect as pandoc --from=markdown --to=html --output=CodeDownManual --table-of-contents --standalone --css=CodeDown.css CodeDownManual.markdown. ↩

In fact, there are two versions for a line comment in PHP, namely the // and a # symbol. But CodeDown takes only one of the two. By taking // and neglecting #, PHP behaves the same way as the other languages from the C-like syntax family, like JavaScript, C and Java. ↩

See the CoreCodeDown.hs.html documentation of the Haskell CoreCodeDown.hs module, which explains how a new programming language is added to the supported types of code. This simple customization of CodeDown is complete, it even implies the automatic generation of the help messages, i.e. a call of codedown --help=XYZ. ↩

In this context, Perl, Python and Ruby are considered languages that only have line comments, because their block comments use a special markup for their own document converters. ↩

In the implementation of the general document generators in the CoreCodeDown.hs module we say that a code language is of type 1, if it has a line, but no block comment. If it is the other way round, we call it a type 2 code language. If it has both, line and block comments, it is of type 3. For example, scheme and bash are type 1, SML and SQL are type 2, and C and (Common) Lisp are type 3. ↩

PDF output is generated via LaTeX and is supported with the markdown2pdf wrapper, included in the Pandoc installation. By using codedown, all this is done automatically. For example, calling codedown -f markdown -t pdf -i example.markdown -o example.pdf should work just fine. ↩

To be precise, the order of the options in a codedown call is not entirely arbitrary, namely in case you specify the same option several times. But this is never intended and average users will avoid doing that, anyway. ↩

As it is common for one-letter UNIX command options without values, these one-letter flags can be condensed into a single one. For example, in UNIX, a call of ls -A -l -r -R -S is equivalent to ls -AlrRS. This works in CodeDown and Pandoc, too, but the time and space to mention this is probably not worth the time that can be saved when using these abbreviations. ↩