1 Introduction to PlainDoc

PlainDoc is a document production system based on plain text files. It
tries to keep most of the document in human readable form with the
intent that the PlainDoc source code itself will serve as the plain
text version of the document.

Fig-1: Generation of pdf from sources (simplified)

PlainDoc system was developed by Sampo Kellomäki (sampo@iki.fi) from
around 2002 onwards with the aim of solving document editing problems
for writing:

Getting a neophyte to a reasonable level of productivity and achievement should be easy. A college freshman should
be able to use PlainDoc after 1 hour training, provided
that all the tool chains have already been installed

It must be very difficult to fatally corrupt a document; fixing corruption should be as simple as editing the file

It must be possible to do diffs between versions of the document

Using cvs (or git) should be well supported (helps to avoid fatal loss of document)

Enable use of plain text productivity environments like emacs(1)

The PlainDoc system MUST be serious enough to produce most any type of document and thus end the need to use any other system

Typeset quality output in paper and web formats

PlainDoc has now (Sept, 2007) been around for more than five years
and it has been successfully used to produce

Major IT specifications conforming to formatting rules (120 page range)

PlainDoc acknowledges its LaTeX legacy and does not aim at WYSIWYG
(except in plain text document production, of course :-) however we
are not totally against visual formatting either. Thus many
hooks for accessing the underlying document formatter's capabilities
have been made available, such as

Direct entry of TeX code (allows setting margins, etc.)

Direct entry of DocBook code

Direct entry of HTML code

Support explicit line and page breaks

Support for raw image placement (i.e. NOT using floats)

These should allow you to get your job done without the system
philosophy standing too much in the way, while for most part
leveraging the automatic formatting of standard constructs.

1.1 Tool chains

The PlainDoc system is actually composed of multiple programs. Most
important of them is the pd2tex formatter (which despite of its
name actually produces other formats too), but no meaningful
output, other than HTML, can be obtained without a properly configured backend
formatting tool chain, such as LaTeX system or DocBook tool chain.
Some more frontend tools may be helpful if you need to
add diagrams or images to your documents.

Table 1:Backend Tools used in a PlainDoc environment

Tool

Purpose

pd2tex

The main PlainDoc processor itself

LaTeX (teTeX)

Typesetting system, PostScript and PDF backends

gs (GhostScript)

Rendering back-end

make

Automate document generation and maintenance

cvs, svn, git

Version control and collaboration (optional)

perl

Tools are written in perl, but use few modules

gcc

For compiling the tools (optional)

Table 2:Frontend Tools used in a PlainDoc environment (all optional)

Tool

Purpose

GraphViz / dot

Draw graphs (vectorial) from textual input

gnuplot

Draw graphs (vectorial) from statistical data

dia

Vectorial diagram (hand drawn) support

gimp

Bitmap graphics and photography support

ImageMagick

Automated processing of bitmap graphics

gv (GhostView)

Previewing tool for postscript and pdf

acroread

Previewing tool for pdf

xpdf

Another previewing tool for pdf

emacs

Edit text, GUI for invoking commands

1.2 Data flow

PlainDoc system is best understood as a process rather than an
application. Understanding of complex documents is easier if you
think about which files are the sources, how data flows from them to
intermediate files, and finally gets assembled to the document, and
possibly converted to target format. Programmers will recognize that
pd2tex behaves very much like make(1), checking which source files,
like images, changed, and runs the commands necessary to convert them
to pdf and then triggers the LaTeX system to produce the final
document.

Fig-2: Data flow and image conversions

2 Invocation

Usually all you need to do is

pd2tex your-doc.pd

This will generate a tex/your-doc.pdf file that you can view with acroread(1). It
also generates the html/your-doc.html and ./your-doc.dbx versions of the document. If
the document contains images, automatic steps are taken to
convert them to .pdf and .png formats as needed by the documents.

For full option listing, please try

pd2tex -h

which produces (you should still run it to see what options your copy of pd2tex supports):

3 Syntax

I recommend you just start writing as if you were writing a plain text
email. Then come back here and see you how can apply some formatting.
Best way really is to learn by doing (running pd2tex a million times
in the process). Trying to learn the system before you start writing
will just lead to frustration. About the only important thing
you should remember up front is

Paragraph break is created by putting an empty line between paragraphs, i.e. single newline will not break paragraph - you need two.

3.1 Section structure

PlainDoc uses underlined titles to indicate section headers. Different
types of underlining indicate different levels. Generally you should
make the underlining same length as the section title text, but pd2tex
actually allows for some slop so do not get overly worried about this.

Usually you will use section numbers in front of sections, but
underlying document formatting system will assign the numbers
sequentially anyway, ignoring your numbers. This means that
any numbers in the .pd file are only for benefit of those who
read or edit the .pd file. This also means that there is
no particularly urgent need to renumber if you happen
to add new sections or change order - the PDF output
will have the numbers sequential irrespective of whether
you make them sequential in the .pd.

If you would like pd2tex NOT number sections automatically,
then you should add near beginning of your document

<<pdflags: secnum=0>>

you may also find

<<pdflags: stripsecnum=0>>

useful as this allows you to control the section numbering manually.

The underlining scheme only works if the underline is at least four
characters long and there is an empty line before the title. In some
exceptional cases you need section titles shorter than that - or
pd2tex gets confused for some other reason. In these situations you
can use the following special forms

N.B. Although the above look like tags, there is no closing
tag. The section simply ends when another section of the
same level starts.

N.B. The fourth layer (1.1.1.1 Subsubsubsection) is only avaliable for
documents of style "book". For other document styles you may get LaTeX
errors about subsubsubsection not being
supported.

The sectioning markers actually take a couple of optional
extra arguments

<<sec:id:short title: Section title>>

The ID argument is used for internal references, such as see
specifications and paginated HTML file names. By default
the ID is formed from the text of the Section title by
squasing certain special characters. You may want to
choose the ID explicitly if you anticipate changing the
section title and need a stable ID for your see references.
Another reason to pick an ID is that your ID can be much shorter
that the automatically made one.

The short title argument allows you to specify an alternate shorted
section title that is used on the footers and headers as well as in
the table of contents. This only works with LaTeX / PDF backend.
You may want to pick a shorter title so the headers will format nicer.

3.2 Document preamble

Usually you start PlainDoc documents with a preamble that controls
formatting template and provides metadata like revision control and
authorship information. All these tags are optional and have
reasonable defaults. (In the following, the two starting angle brakets are
spearated by space to prevent interpretation. In your own document you
would omit the space.)

The first line that starts with the hash character is an optional
comment that identifies the file as PlainDoc file. If you have
emacs pd-mode installed, it will automatically be switched on.

class

The class tag takes as an argument a string which can be divided into up to 6 parts separated by exclamation marks. The
first part is the LaTeX document class name.

The second part is for optional arguments to LaTeX document
class. This is typically used to specify paper size and point
size of main font.

The third part are optional arguments to pass to LaTeX babel package
that deals with language specifics. Usually you would pass the ISO
language code (e.g. "pt" for portuguese). The default is english.

The fourth part is an optional string to be included in footer
or header of your document. Usually it would be abbreviated
identification of the document, or perhaps your name. The
exact way how this gets used will depend on the format template.

The fifth part is also optional. Some format templates display
it after page number, thus permitting you to create effects
like "page 5 of 37".

The sixth parameter, which is optional, can supply additional
options. Currently defined options include

lineno

Turns on line numbering (at least in tex/pdf output)

In absence of class tag, the default document class is article.

cvsid

Intended to hold revision control identifier, usually used for CVS Id tag.

version

Allows version of the document to be formally declared. Typically this is the externally visible version designation and most
of the time this has nothing to do with cvsid.

author

Indicates document author, and often email, too. The author information is used to generate the title page. There is no
special formatting for author information, but if you include
an email address, you may want to put it in parentheses rather
than the customary angle brackets to avoid confusion about
where the tag ends.

credit

Indicates other (minor) authors or people who should be given credit for the work. The string on the tag line will be
used as title of the credits section. All subsequent lines
describe the worthy contributors, one per line. It is
customary to separate the company name by a comma.

history

Change log of the document. The string on the tag line specifies the title of the change log and
rest of the tag is formatted as description list with
bulleted sub lists. Usually the description title (the
part before double colon (::)) is the revision
number of the document. This is followed, on the same
line, by date and editor, separated by a comma. All
subsequent lines should be formatted as single level
bulleted list, one list item per line (i.e. wrapping
lines does not work). The bulleted items must be
indented by exactly four spaces because it is a
sublist of the description list (see list below).

You may have a change log in CVS. If you want to
use that, I suggest you write a perl script that
extracts it from cvs and formats it according to
the conventions of the history tag and then
just use the file inclusion facility to bring it in.
I.e. we do not support this very well yet, patches welcome.

abstract

Used for short description about the document, usually abstract of a scientific paper. No
special formatting requirements.

See also moretexpreamble, texpreamble, dbxpreamble, additionalarticleinfodbx, and
htmlpreamble.

3.3 Paragraphs and text emphasis

A new paragraph is started by an empty line (or a paragraph ends in an
empty line if you like). There is no special marker for this. A mere
newline does not start a new paragraph: you need two newlines in
sequence. This allows paragraph body text to be wrapped with simple
newlines. Note that the formatter
will not respect the simple line breaks, it will still format the
paragraph as a whole.

You can introduce some emphasis formatting
using special characters

*bold*
+italic+
~computeroutput~
[REF]

Sometimes your document is so hairy that pd2tex gets confused in
detecting whether star or plus really means emphasis (they could mean
mathematical formula or even bulleted list). In these cases you can
use following forms to disambiguate. One particular case where this is
necessary is when you want to simply make just one character italic or
computer output.

If you are aiming only at using the LaTeX based formatter, you
can also access the TeX math mode using dollar signs:

Einstein's famous formula, $E=mc^2$, is very simple...

3.3.1 Verbatim text

If you want to create a bigger block of verbatim text, just indent it
by two spaces more than surronding document (this technique is used to
generate most of the inset monospaced (Courier) blocks such as the one that
follows).

And the listing follows
function foo(bar) {
a = bar;
return a+3;
}
As can be seen, the code is trivial.

For formal specification writing you may want to use
special tag schema

Usually this produces just verbatim output, but may allow
some automated processing on the schema.

Similar code and logoutput exist for illustrating
program code and logs respectively. All these forms of
verbatim output may eventually evolve to support some
form of syntax highlighting.

3.3.2 Block quotes

To create an indented block quote, you start each line of the quote by
a greater than symbol, in a manner to quoting in email or Usenet (news)
posting.

> Block quote example
> second line
> Second paragraph.

Would render as

Block quote example second line

Second paragraph.

As can be seen, the specific positions of single newlines within block
quote are ignored: all of it is formatted as indented paragraph. If
you want to create paragraph breaks in a block quote just follow the two
newline rule.

3.3.3 Footnotes

Footnotes are created using footnote tag, which may
wrap to several lines.

<<footnote: Example footnote>>

There are no special formatting requirements for the text of the
footnote, except that you have to be careful about not confusing
pd2tex about where the footnote ends.

3.4 Bulleted and numbered lists

Bulleted lists are started by including on left
edge a bullet character and a space and then providing
the text for the list item. If text wraps to two or more
lines, you need to indent the subsequent lines by as much
as the beginning of the text on the bullet line. Top level
list can only start after an empty line (this is to avoid
misdetection of bullet characters appearing as first character
of a line in an ordinary paragraph).

Numbered lists work similar to bulleted lists: you simply start the
line with a number and a dot and a space and follow the text for the
list item, indenting correctly if it wraps. Instead of arabic
numerals, you can also use letters. The actual numbering of the
ordinal list items is done automatically by the underlying formatter,
so the numbers that you provide do not matter (but you must provide
a number for pd2tex to understand that you are creating an ordered
list), they are only for your own reference - or reference of
those who want to view your document in the plain text format.

Description lists are introduced with a double colon. The
text before the double colon is the description title
and the text that follows is the description body. The
body can be wrapped to multiple lines, but you need
to indent the subsequent lines by four spaces.

PlainDoc supports arbitrary nesting of lists of different types. Also
verbatim code and certain other constructs can be nested in
lists.

This renders as (may appear on separate page due to underlying
formatter's float placement algorithm): see table 3.

Table 3:example caption

Header1

Header2

Header3

row1col1

row1col2

row1col3

row2col1

row2col2

row2col3 last col overflowing

row3col1

row3col2

row3col3

row4col1 n.b. empty line starts "row mode" table where each line

row4col2 represents a cell and the amount of text in each cell

row4col3 can exceed the width of the column (wraps to multiple lines)

row5col1

row5col2

row5col3

row6col1

row6col2

row6col3

Also longtable keyword can be used. That will cause the table
to be split across several pages (if it's long enough).

minitable keyword causes the table, which should not be big,
to be placed inset in the text, i.e. the text will wrap around
the table.

Table 4:Minitable caption

Col1

Col2

Abc

This is minitable row 1

Def

This is 2nd row

Column widths are controlled by the number of equals signs under the
table header. They are NOT computed automatically. You can tweak the
table by adding or deleting equals signs. The amount of space per
equals sign is controlled by~texcolwidfactor~ and
$dbxcolwidfactor in pd2tex source code. Rather than tweaking
these factors, you are encouraged to experiment and iterate the number
of equals signs in your document until you are happy. Eventually you
will gain insight as to what is a good number of equals signs.

When composing a table, you usually horizontally align the columns. This
means that the text MUST fit under the column header. However,
sometimes it would be better if the text wrapped to multiple lines
instead of forcing the column very wide. For the last column of the table
this is accomplished simply by letting the text run off the right edge. However,
for the other columns, you need a different trick:

If an empty line is encountered in a table definition, the next row is described by having one column per line. The number of lines you supply must match exactly the number of columns in the table. Otherwise pd2tex will get confused and misformat your table - and quite often most of the rest of the document.

The table facility is not fully flexible,
but gets the job done for most simple and medium cases. If you really
need a complex table, you will need to use tex or dbx tag to
insert directly your formatter dependent code.

If the line immediately following the equals signs, has keyword WIDTHS: followed by comma separated list of numbers, then these
numbers are used for table column widths. An empty specification
leaves the column width as specified by the equals signs. A plain
number specifies the width as absolute millimeters. A number prefixed
by plus or minus sign makes the column that much wider or narrower,
respectively.

If line immediately following the equals signs has keyword OPTIONS: then
the rest of the line is parsed for table options. The first option
specifies the reference tag for the table (e.g. for use in a see specification).

If a table does not fit on one page, consider using longtable keyword, which
reuires long table support at LaTeX level as well. If you do not want table to
float, you can use rawtable keyword.

3.5.1 Comma Separated Values tables

Many spreadsheet programs allow exporting or saving the spreadsheet
in .csv format, i.e. as comma separated values. Such file can be
imported to Plaindoc document as a table using csv tag:

<<csv: file,topleft,botright,options: Legenda>>

where topleft and botright are specifications of cell
in letter-number syntax customery to spreadsheets. For example,
given following demo.csv

The first row is always assumed to contain column titles. The equals
signs row is necessary and is used to determine column widths in the
output. Any further rows are considered to be normal data.

The default separator is comma. Thus comma should not exist anywhere in the
values. The double quotes are not sufficient to protect commas in values.
If you need to use comma in the values, then you need to use some other
character as separator. Currently only other separator available is the
pipey symbol "|". To use it, you need to specify pipeysep as option, e.g:

<<csv: file,B1,D2,pipeysep: Example>>

3.6 Images

You can include any general image using the following constructs. The image will
be converted to .pdf (with .eps intermediary, unless it's already in one of these formats).

where posspec is a LaTeX position spec. The file parameter
specifies the file name without any extension. The extension is not
relevant because pd2tex will automatically attempt conversion from a
variety of file formats. If the automatic conversion fails, you may
need to manually convert the image to .pdf format and place it in tex/
subdirectory (where it would have been placed by the automatic
conversion).

Table 6:LaTeX position specs (with extensions)

Spec

Meaning

!

Try harder

H

Here, forces image here

h

here (if only spec, forces image here)

b

botton

t

top

p

floats page

!hp

Try hard here or floats page

Www

Wrap text around figure. Figure width ww cm.

R

Raw. Do not use float. Must leave caption empty.

*

Causes figure* to be used, as may be needed in twocolumn documents.

sizespec can come in two variants: either as symbolic
or as hard coordinates.

Table 7:Size specs

Spec

Meaning

wXh

Hard absolute width by height (both can have units)

2cmX3cm

2 by 3 cm

th

LaTeX Text Height (can also be used as unit)

tw

LaTeX Text Width (can also be used as unit)

1twX1thS

S is the stretch flag

n

Natural, size taken from image itself (no forced resize)

1

The default, corresponds to 1twX1th

15

1.5, 67% size

2

Half size (50%)

3

Third size (33%)

4

Quarter size (25%)

8

Eigth size (12.5%)

80

0% size

trimspec permits image to be cropped. It has format

L1B2R3T4

where first number specifies number of points to trim from left,
second number specifies the points to trim from bottom,
the third number specifies the points to trim on right,
and the fourth number specifies how much to trim from top. Use
this option for cropping badly behaving eps images (e.g. if
original image is missing bounding box and ends up occupying
a whole page).

If you are frustrated with LaTeX floats going all over the place, try

<<img: foo.png,R,n: >>

This causes Raw positioning (without float) and uses "natural"
image size, i.e. whatever the original size of the image is, without
any attempt to squeeze or stretch the image. Note that if you use R,
you MUST NOT supply caption. If you use this approach and are not happy
with image size, you should edit your image in your favorite image editor (this
exercise may make you eventually appreciate the built-in scaling features).

3.6.1 Dia diagrams with layers

Often its convenient to prepare a diagram with multiple overlays to
illustrate multiple aspects of the same topic. In dia(1) this is usually
done by creating the overlays as layers and then controlling the
visibility of the layers when exporting the image.

To make this task easier, PlainDoc supports specification
of the layers using special tag:

<<dia: file,posspec,sizespec,trimspec:layer1,layer2: Legenda>>

This is almost the same tag as the img, however with the twist
that layers are specified between first and second colon. Use comma
to separate layer names if you have multiple. See the above section
on images for description of other specs.

3.6.3 gnuplot diagrams

You can create gnuplot diagrams as normal images. pd2tex has support
to automatically invoke gnuplot if there is a file whose name
corresponds to missing image and ends in the extension .gnuplot. The
file must contain gnuplot commands, but due to gnuplot's ability
process inline data (file name '-' in plot command), can also contain
the data itself.

Another way to create a gnuplot diagram is using gnuplot directive
and include the gnuplot commands and data inline in your .pd
file. For example:

Note how '-' was specified to include the data inline and last
line is e to indicate the end of the data. Your data SHOULD start with
set terminal postscript eps stanza. If this line is missing, it will be supplied with
one using default arguments. If you do not want to use Latin 1
(ISO-8889-1) encoding, you should specify the desired encoding on the
first line. See gnuplot(1) documentation for further information. The
above would create output in Fig-4.

Fig-4: Legend for gnuplot diagram.

3.6.4 GraphViz or dot graphs

You can create dot(1) diagrams as normal images. pd2tex has support
to automatically invoke dot(1) if there is a file whose name
corresponds to missing image and ends in the extension .dot. The
file must contain a description of a graph in dot(1) format.

Another way to create a dot diagram is using dot directive
and include the dot graph inline in your .pd
file. For example:

In the references section you describe the references. You start a
reference by the bracketed tag that was used in the text to refer to it and
follow that by description of the reference. No special structure
exists for the description.

If you want to use structured database to keep and format
your descriptions, you can write a perl(1) program to generate
the references in the format you like from the database
and use the PlainDoc inclusion facilities to bring them
into your document.

It is possible to have more than one bibliography, simply use
different title for them, e.g. "Normative" vs. "Informative".
If you do not supply any title, the default title of the
underlying formatting system is used.

3.8 Referencing Sections, Tables and Figures

Its fairly common for a document to reference a figure, e.g. "see
Fig-1.2". However, since sections, tables, and figures are
automatically renumbered as needed, you can't safely just
hard code a number in the document. Instead you should
use the see construct

The identifier for a section is derived from the section title by
substituting all problematic characters with an underscore. For
example, see 3.3
or see Syntax section .

The identifier for a figure is derived from the figure file name by
substituting all problematic characters with an underscore. Figure
identifier is always prefixed by fig: prefix.

The identifier for a table is derived from OPTIONS specification
within the table - if there was no OPTIONS spec, then the table
is unreferencable. The table identifier is always prefixed by table:
prefix.

3.9 Creating Index

To enable index, you must include somewhere in
your document

< <makeindex: 1>>

This triggers index generation and will insert a section containing the
index.

Creating index involves marking the words to be indexed
with ix construct, like this:

< <ix: Dickens>> said that...

All bibliographical references, function names, path names, URLs, and
email addresses are automatically included in the index. You can also
specify words, concepts, and people indexes as follows

In general all of the above accept one indexable phrase per line and
then make great effort to detect occurrances of said phrase in text
of the document. This in general will avoid cluttering most of the
text with ix declarations, but has the disadvantage that even the
irrelevant mention of the phrase will get indexed. Also, there is
no easy way of indicating the most relevant index entry.

Indexing currently only works with LaTeX backend.

3.10 Including other files into document

File inclusion facilty of PlainDoc is a very powerful
way to assemble large documents from smaller bits and
pieces. Typically you would have one .pd file for each
chapter and then a master document that pulls them
all together.

To include a file you simply enclose its name in double
angle brackets (n.b. we had to insert a space between the
angle brackets to prevent their special interpretation
here).

< <path.ext> >
< <includerange: path.ext: start-end> >

The includerange tag allows you to include only selected lines from
the other file. Line numbers are zero based (i.e. first line is 0) and
both must be specified, however it's ok for the end to be out of range,
e.g. use 9999 to include everything until the end of the file.

Generally all includes are processed in a special preprocessing step
before other tags and formatting are processed.

3.11 URLs, email addresses, paths, and function names

Some constructs used by programming and web documentation have
distinctive syntactical structure that is fairly easy to
recognize and therefore is formatted specially.

Email addresses are recognized by at character (@). For example

sampo@iki.fi

introduces an email addess which is formatted using teletype
font like this: sampo@iki.fi.

URL formatting is recognized by :// somewhere near
beginning of a string, e.g:

However, some well known file extensions are recognized
separately. For example foo.pl is not a URL in Poland, but rather a
file with extension .pl (as in perl(1) script). Similar exceptions
apply to foo.cc and foo.hh which are common extensions for C++ source
code.

Presence of slash anywhere in a string or presence of dot in middle of
a string cause the string to be considered a filesystem path and to be
formatted using teletype font. Examples:

would format as foo.ext or /foo or foo/bar or foo/bar.ext or
foo/wee/bar or foo/wee/bar.ext or foo/ or .ext.

Dotted quad format IP addresses are recognized. There are some
provisions for wildcarding or indicating the netmask. Following
should work

192.168.1.*
192.168.1.0/24
192.168.1.1

and format as 192.168.1.*, 192.168.1.0/24, or 192.168.1.1.

Uniform resource names are recognized, if they start by urn and colon,
like urn:liberty:foo

For benefit of documenting XML, structures like <tag> are recognized
and rendered as computer output.

Following an old Unix convention of suffixing function names
and manual page entries with parentheses, like this

function()
fork(2)
strlen(3)
proce_dure(a,b,c)

would format as function() or fork(2) or strlen(3) or proce_dure(a,b,c).

The PlainDoc formatter recognizes these structures and formats
them using italic font. In this context the undescore
character looses its special meaning (i.e. LaTeX math mode
subscript command).

You can prevent the automatic formatting from happeing by wrapping
the text in e-tag, like:

<<e: and/or>>

If you do not want automatic formatting to happen under any
circumstances, you can specify:

<<pdflags: autoformat=0>>

3.12 Other special formatting

(*** TODO items)
< <ignore: comments out a block> >

Todo items - expressed as opening parentheses, three stars, some text
and a closing parentheses - do not appear in formatted document. They
allow editor to add notes where she needs to revisit something.

The ignore tag allows you to "comment out" sections
of the document. Ignore blocks do not appear in the
formatted output - this is a bit difficult to illustrate.
For commenting out really large sections, it may be easier to use <<if: 0>> blocks,
see below in "Conditional processing" section.

3.12.1 Passing Comments to Backend

< <comment: Your comment here >>

will produce in HTML and DBX output

<!-- Your comment here -->

and in TEX output

; Your comment here

The difference between ignore and comment is that the former
prevents the text from reaching the backend at all while the
latter will pass the text to the backend, but use the backend's
comment syntax to escape it (so it will typically not render
even if it is in the file).

N.B. If you want to pass comments only to a specific type of backend, you can use the backend specific tag, such as < <html: <!-- HTML only comment --> >> < <tex: ; TEX only comment >>

3.13 Special support for grammars

You can include fragments from a schema grammar file as figures with

<<sgfrag:sgfile:yoursection:xsdfile.xsd: Caption>>

The sgfile specifies the name of the file without the .sg extension.

The yoursection looks for

#sec(yoursection)
foo
#endsec(yoursection)

inside the schema grammar file and extracts the content (foo in this case).

The xsdfile.xsd specifies optional xsd file (see below).

THe Caption is the caption for the resultig figure.

If you want to render schema grammar fragments as underlying xsd,
you can specify

<<pdflags: showsgasxsd=0> Display schema grammar as schema grammar. The default.
<<pdflags: showsgasxsd=1> Includes the XSD file using DocBook or XML include
<<pdflags: showsgasxsd=2> Inlines the contents of the XSD file

3.14 Outputting verbatim blocks as files

Sometimes you want to keep some schema fragments inline in document,
but would like to output them as files for other mechanized processing
as well. For this you should use schema, code, or logoutput tag
with optional file argument as follows:

< <schema:filepath: verbatim data
more data
>>

3.15 DocBook only

< <dbxpreamble: > >
< <additionalarticleinfodbx: > >
< <dbx: > >

N.B. This section may be illegible in some output formats. Please
consult the original sampo-plaindoc.pd

3.16 HTML only

< <htmlpreamble: > >
< <html: > >

N.B. This section may be illegible in some output formats. Please
consult the original sampo-plaindoc.pd

You can also create hyper links using,

<<link:url: Text>>

For example: ZXID. The URL itself may
contain colon (e.g. as in http://...), only colon followed by a space
starts the text. If no text is supplied, the URL itself is used as
text. For example symlabs.com. There can not be space after
first colon and there MUST be a space after second colon.

3.16.1 Multipage HTML

Multipage HTML allows each section, subsection, etc., to become a file
by itself. The file name is generally formed from document base name
and the section label that corresponds to the file.

THe HTML headers and footers for the files can be specified with

< <htmlpreamble2: > >
< <htmlpostamble2: > >

The pre and postambles can be customized by using bang bang (!!) macros

TITLE

Page title, composed of section number and title of the section

BASE

The document base name

PREV

Link to page of previous section in navigation order

NEXT

Link to page of next section in navigation order.

3.16.2 HTML Info Boxes

HTML infobox is a HTML table that can be visualized or hidden using JavaScript.
It is convenient means of saving real estate on page, while still including
text in easily accessible form.

< <infobox:id:link:tableargs: Content> >

id

Is HTML object ID that is used in JavaScript manipulations to refer to the box

link

The link text for visualizing the box

tableargs

Additional arguments for the <table> tag, usually used to control width, alignment, and style.

3.17 TeX only

3.18 Conditional processing

Plaindoc supports conditional processing using

< <if: MACRO> >
foo
< <else: > >
bar
< <fi: > >

where the MACRO is a defined either with < <define1st: MACRO!VAL> >
construct or is passed as -DMACRO=val command line flag (n.b. the
usual !! in front of macro is not used). The else block is mandatory
(but can be empty). Macros defined using < <define: MACRO!VAL> >
construct can only be processed after first expansion of includes and
conditional processing.

3.19 Summary of Special Characters and Their Meaning

PlainDoc works by giving some punctuation and special characters special
meaning. Usually these characters work in the normal way unless used
in special context. Generally you should not worry about them too much
when editing documents, but if output shows that PlainDoc has indeed
confused a punctuation character used in plain meaning with the
special meaning, you may need to take some steps to disambiguate
the meaning. Often this involves adding whitespace or some rearrangement,
but in extreme cases you may need to recourse to some special PlainDoc
syntax or LaTeX syntax.

This enables special page size and margins that are useful for
creating slides. It also creates a page break after each section
(there may be other page breaks if you have more material than will
fit on one slide). Of course you can always add more page breaks
by using

<<newpage: >>

construct.

The moretexpreamble stuff is direct LaTeX code that allows you fine
control over headers, footers, and the background of your
slides. Especially the overlay feature is great for getting the
"corporate look" to your slides. If you do not understand what it
does, you need to ask some LaTeX expert. One caveat: the .pdf files
that you might use in includegraphics are relative to the tex/ directory.

If you need to get just one or two more lines on page, you may
find

< <tex: \enlargethispage*{\baselineskip}> >

useful.

In slide mode, the sections and subsections are not numbered. If
you want numbering, you should simply add the numbers manually.

You can include images and figures in your slides in a normal
way. However, at times it may be useful to omit the legend
from the figures. You can do this by supplying "0" (zero) as
the legend.

The tricky part is getting the landscape slides ordered so they read
naturally while most 4-up printing software (like mpage(1)) are geared
towards portrait printing. If you print one, or even two, slides per
page, this is not likely to be a problem. "Natural" two sided printing
is left as an exercise to the reader.

5 Installing the tool chains

It's easiest if you get your PlainDoc system already compiled and
installed by someone, but if you are familiar with building open
source software, building all of your own tool chains is certainly
feasible. The pd2tex itself is a perl(1) program so it does not need
any compilation, but it depends on many other programs so you need to
have them in order to have a "tool chain". In this chapter I explain
how I built mine and try to give some tips.

In the very minimum you will need perl(1). Generally perl comes with
just about any Linux distribution and with most other Unixes so this
is not a major obstacle. With perl only, you will be able to generate
HTML output as well as .dbx and .tex intermediate files. To further
process the latter two, you will need to install additional tools.

teTeX variant of LaTeX usually ships with Linux distributions
and is easily obtained and installed for other Unixes. For Windows
MikTeX is the best alternative. DocBook toolchains are not explained
any further here: refer to your favorite web search.

Since a lot of information here depends on the particular versions of
the software packages and is always in flux, you should expect some
discrepancies when you actually build your own system. If my receipe
does not work for you, please study the documentation (usually INSTALL
and/or README files in the top directory of each software package's
source code tree) and try to build it the way they recommend.

These receipes were created around Sept. 2004. You can expect that
these instructions will be updated from time to time.

N.B. gcc(1), binutils(1), and glibc(3) are probably only worth worrying about
if you plan to build everything from sources.

The perl dependency is not very sensitive, because pd2tex(1) does
not use any perl modules (except the ones that distribute as
standard). While the development work happens currently (Apr 2006) on
perl-5.8.4 system, no exotic features are used, so it should work with
perl-5.6 and may even work with perl-5.003. I'm interested in patches
to ensure backwards compatibility.

5.1 Preliminaries

Most of these preliminaries are likely to have already been satisfied
by your linux distribution.

5.1.1 zlib-1.2.1

Nearly all Linux and Unix platforms ship with zlib, so usually this
requirement is trivially satisfied.

The bug #153606 is most relevant for enabling automated exports. Bug #153607
may be relevant for european language uses. Bug #153609 contains an
important patch to work around the problem (disabling font cache).

5.4 teTeX or other LaTeX

You will need some sort of LaTeX system to generate PDFs. The teTeX-2.0.2
that ships with nearly every Linux distribution (as of 2005) is adequate.
More recent Linuxes have texlive, which is good.

sudo apt-get install texlive-full # Works on Ubuntu 12

Windows users should get MikTeX.

5.4.1 Additional LaTeX packages

Installing additional LaTeX packages is optional for most situations.

floatflt

already included in teTeX-2.0.2, but sometimes missing on Ubuntu, see below.

lineno

only needed if you want line numbers, needs installation and adding to preamble ()
or specifying lineno as moreopts in class.

longtable

only needed for long table support

textpos

only needed if you need arbitrary placement of text and graphics (needs install)

everyshi

Required by textpos (already included in teTeX-2.0.2)

enumitem

Control list spacing (optional)

listing

Special formatting of program listings (think code tag)

Usually you install additional LaTeX packages (you can download them
from ctan.org) as follows

cd /apps/teTeX/2.0.2/share/texmf/tex/latex
tar xvzf /t/textpos.tar.gz

The package directory should appear as immediate subdirectory
of the share/texmf/tex/latex directory.

5.6 Graphviz-2.0

Graphviz is a neat tool for generating diagrammatic graphs from
textual input files. The syntax of the graphing language is very
natural and easy to learn. Further more, PlainDoc system
integrates full support for Graphviz, and specifically dot(1) tool.
You can find more about Graphviz from graphviz.org, including
how to download and install this great tool.

However, if you do not wish to draw graphs using Grpahviz, there is
no need to install it.

sudo apt-get install graphviz # works on Ubuntu

5.7 GhostScript (gs-8.53)

Ghostscript is the real workhorse behind PlainDoc. Many image conversions
of pd2tex rely heavily on Ghostscript and it is used by visualization
software like gv, GSview, gpdf, and xpdf, so life without Ghostscript is
nearly impossible. Good news is that pd2tex is not very sensitive to
the version of Ghostscript and most gs(1) binaries in the mainstream
Linux distributions work fine. Ghostrcipt web site: www.ghostscript.com

6.2 PlainDoc vs. other formats

What about perl pod? Perl pod (Plain Old Documentation) is a pretty good system and, in
hindsight, I guess I could simply have improved it, but
at the time (2002) it did not seem high enough calibre for
serious technical document production (its apparent main focus
is on generating software documentation). POD appeals
only a little to the neophyte audience.

PlainDoc looks like Wiki, why invent another format? Wikis have some "plain text" merits, but the formatting of bulletted
lists or section titles does not really follow the usenet news / email
convention or culture. Besides, the Wikis have not managed
to agree in any common markup. If there ever is common
Wiki markup, we will probably support importing and exporting it.

Why not just edit directly LaTeX? Pure LaTeX is not human readable and format conversions from LaTeX to, say, DocBook
or HTML were at the time (2002) much less than perfect. LaTeX
does not appeal to neophyte audience.

Why not just edit directly DocBook? Pure DocBook is not human readable and the syntax (as most XML syntax) is
too baroque for human editing. Sure you can edit it using
emacs, but you will soon start to think "there's gotta
be a better way". If you use some <<e: GUI/structured>> editor
like OpenOffice to edit DocBook, you will not be
able to meaningfully diff the files. DocBook does
not appeal to neophyte audience.

What about Lyx? Lyx is a GUI. I do not want a GUI. Lyx output is quite texish, thus not very human readable and thus
the Lyx document can not be used as the plain text document.
Back in 2002 LyX plain text output left much to desire.
Sure, LyX does appeal to certain category of neophyte user,
but I think it does not help to wean people off the GUI and
WYSIWYG model (despite the claims to contrary by LyX team).
LyX documents can not be easily diffed since the gui is
liable to reformat the entire underlying file any time
you do any change.

Word will do the job! No. Word is a GUI. Word is not plain text format and word documents are very prone to
corruption. Word plain text output leaves much to desire.
Word does not run on all platforms. Word documents
can not be diffed using simple tools.

OpenOffice? Mainly same gripes as with Word. OpenOffice XML file format (or DocBook format) still suffers from the GUI capital
crime: any change to the document and the entire XML is liable
to be reformatted. This makes diffing them hell (it also does
not play nice with cvs, but this is minor point).

6.3 LaTeX tips

Unfortunately its possible that you will during the pdflatex
command run to TeX related errors and the process stops (pdflatex will
print a lot of scary looking messages, but unless it stops you can
ignore them without much harm done). First, do not panic. You can get
out of pdflatex by typing X and Enter. This will abort the
TeX process.

When an error happens, you should understand why. First task is
finding where in the document it is happening. The line numbers
reported by TeX refer to the .tex intermediate file corresponding to
your .pd. You may examine this file and try to understand the cause,
or you may just try searching in the .pd source for the text that
appears to be causing trouble.

Unless the cause is trivial, or you are a TeXpert, the chances are
you are stuck. At this point, either try to get TeX help (read a book,
try Google) or try trial and error to see which part of the document
is causing the indigestion. You can eliminate parts of document
by enclosing them in ignore clauses, or just by deleting them
entirely. Often this is an iterative process of trying a fix,
regenerating, and previewing. Do not give up.

Be suspicious of special characters in complex constructs getting
misinterpretted.

Beware that sometimes a structure that does not close, may cause
weird errors far down the line. A very common case of this
is when you use the empty line hack to introduce wide table
columns one per line and you get out of sync.

To cram little more on page use

< <tex: \enlargethispage*{\baselineskip}> >

6.4 Some common LaTeX errors

Too deeply nested

Apparently this really means what it says. Maybe something not closing?

Float too large

Picture or table is too large to fit in available space on page. Ignore.

Overfull vbox

Means that something didn't really fit. May cause misformatting and ugliness. Ignore, it's only a warning.

Missing \ inserted

Automatic switch to math mode: char (e.g. underscore) only allowed in math mode was seen and LaTeX "helpfully" switches
to math mode. Generally fixed either by eliminating the suspect
character, enclosing text in < <tt: ...> > block, or
some other form of escaping.

Provided that you did not screw up with mental gymnastics regarding
geometry and transformations that relate to inserting the papers
in the right orientation for the second printing pass, you should now
have a stack of double side printed A4s that you can fold in middle
and staple in the center to make your booklet. Folding will often
produce uneven right edge of papers. The best fix is to simply
use a good guillotine to even it out.

6.8 Known bugs

Use of underscore outside math mode will confuse TeX. The right fix is to escape the underscore. Unfortunately this is not
done automatically, so you have to do it manually. Underscore
works right in verbatim blocks and function_names(). Similar
problem exists for caret.

I am not a LaTeX- or TeXpert. I wrote this software to avoid learning LaTeX :-) thus there are probably better ways of doing things if you are in the know.

If rendered document starts by "<1sp" after you added

\usepackage{lineno}
\linenumbers

clause, then this is due to ordering dependency between packages. It appears
that lineno package needs to appear before longtable, and possibly
before fancyhdr. Solution: Use 'lineno' as moreopt parameter of class. Otherwise,
you will have to hand construct a texpreamble.

6.9 Reporting bugs

Currently there is no bug tracking or mailing list. If you are willing to set up such things, please let me know. Until then, mail
all bug reports, fixes, and feature requests to sampo-plaindoc@iki.fi
(this alias will help me sort my mail).

I do not have resources or time to provide much end user support and specially LaTeX error debugging support. Please make serious effort to investigate
and work around the problem before mailing me. If you must include
your document or command output, please trim it to a minimal test case
that will reproduce your problem.

No confidentiality treatment is available for any communication you have with me regarding PlainDoc support. If you must have such treatment, you
must pay for it.

Please use common sense when reporting bugs. If I see version numbers missing or stupid mistakes I will not reply.

I am a plain text person and a laggard in mail technologies. Some of the surest ways of getting your mail ignored are to use attachments,
use HTML content, quote entire message without trimming away irrelevancies,
fail to put your comments inline, or sending any content that looks like
spam. Say what you have to say directly in the message body, including
any code listings or command output. Do not use attachments!

7 Legal, Copyright, GPLv2 License

The PlainDoc system is distributed under the GNU General Public
License, version 2, unless otherwise agreed with the author.
Please contact author if you need other licensing terms.

PlainDoc system and its components and documentation come with
NO WARRANTY, what so ever.

Improvements to PlainDoc system and documentation are encouraged under
the terms of GPL2. However, please make sure your modifications
are either funneled to the main distribution maintained by the
author, or you clearly mark them as your own hacks by using a
different name. You MUST document in ChangeLog any changes you make.