Gentoo Linux XML GuideDaniel Robbins
John P. Davis
Jorge Paulo
This guide shows you how to compose web documentation using the new lightweight
Gentoo GuideXML syntax. This syntax is the official format for Gentoo Linux
documentation, and this document itself was created using GuideXML. This guide
assumes a basic working knowledge of XML and HTML.
2.2October 15, 2003Guide basicsGuide XML design goals

The guide XML syntax is lightweight yet expressive, so that it is easy to
learn yet also provides all the features we need for the creation of web
documentation. The number of tags is kept to a minimum -- just those we need.
This makes it easy to transform guide into other formats, such as DocBook
XML/SGML or web-ready HTML.

The goal is to make it easy to create and transform guide XML
documents.

How to transform guide XML into HTML

Before we take a look at the guide syntax itself, it's helpful to know how
guide XML is transformed into web-ready HTML. To do this, we use a special
file called guide.xsl, along with a command-line XSLT processing
tool (also called an "engine"). The guide.xsl file describes
exactly how to transform the contents of the source guide XML document to
create the target HTML file. The processing tool that Gentoo Linux uses
is called xsltproc, which is found in the libxslt package.

# emerge libxslt

Now that we have the way, we need the means, so to speak. In other words,
we need some Gentoo XML documents to transform. Gentoo has two types of tarballs
that are available for download:

The first type contains the entire up-to-date Gentoo Linux website.
Included are our XSL templates, so if you are planning to transform any
documentation, you will need this tarball. The tarball can be found here.

The second type contains daily snapshots our XML documentation source
in every language that we offer. Please note that it is impossible to transform
documentation with this tarball, so please download the web tarball if you want
to fully develop your own documentation. These tarballs are especially useful
for translators. These tarballs can be found here.

After the web tarball is downloaded and extracted, go to the directory where
the tarball was extracted, and enter the htdocs directory. Browse
around and get comfortable with the layout, but note the xsl and
doc directories. As you might have guessed, the XSL stylesheets are
in xsl, and our documentation is in doc. For testing
purposes, we will be using the Gentoo Linux CD Installation Guide, located at
doc/en/gentoo-x86-install.xml. Now that the locations of the XSL
and XML file are known, we can do some transforming with xsltproc.

If all went well, you should have a web-ready version of
gentoo-x86-install.xml at /tmp/install.html. For
this document to display properly in a web browser, you may have to copy some
files from htdocs to /tmp, such as
css/main.css and (to be safe) the entire images
directory.

Guide XMLBasic structure

Now that you know how to transform guide XML, you're ready to start learning
the GuideXML syntax. We'll start with the the initial tags used in a guide
XML document:

On the first, line, we see the requisite tag that identifies this as an XML
document. Following it, there's a <guide> tag -- the entire
guide document is enclosed within a <guide> </guide> pair.
Next, there's a <title> tag, used to set the title for the entire
guide document.

Then, we come to the <author> tags, which contain information
about the various authors of the document. Each <author> tag
allows for an optional title= element, used to specify the author's
relationship to the document (author, co-author, editor, etc.). In this
particular example, the authors' names are enclosed in another tag -- a
<mail> tag, used to specify an email address for this particular
person. The <mail> tag is optional and can be omitted, and no
more than one <author> element is required per guide document.

Next, we come to the <abstract>, <version> and
<date> tags, used to specify a summary of the document, the
current version number, and the current version date (in DD MMM YYYY format)
respectively. This rounds out the tags that should appear at the beginning of
a guide document. Besides the <title> and <mail>
tags, these tags shouldn't appear anywhere else except immediately inside the
<guide> tag, and for consistency it's recommended (but not
required) that these tags appear before the content of the document.

Finally we have the <license/> tag, used to publish the
document under the Creative
Commons - Attribution / Share Alike license as required by the Documentation Policy.

Chapters and sections

Once the initial tags have been specified, you're ready to start adding
the structural elements of the document. Guide documents are divided into
chapters, and each chapter can hold one or more sections. Every chapter
and section has a title. Here's an example chapter with a single section,
consisting of a paragraph. If you append this XML to the XML in the previous
excerpt and append a </guide> to the end of the file, you'll have a valid
(if minimal) guide document:

Above, I set the chapter title by adding a child <title>
element to the <chapter> element. Then, I created a section by
adding a <section> element. If you look inside the
<section> element, you'll see that it has two child elements -- a
<title> and a <body>. While the <title>
is nothing new, the <body> is -- it contains the actual text
content of this particular section. We'll look at the tags that are allowed
inside a <body> element in a bit.

A <guide> element can contain multiple <chapter>
elements, and a <chapter> can contain multiple
<section> elements. However, a <section>
element can only contain one <body> element.
An example <body>

Now, it's time to learn how to mark up actual content. Here's the XML code for an example <body> element:

<p>
This is a paragraph. <path>/etc/passwd</path> is a file.
<uri>http://www.gentoo.org</uri> is my favorite website.
Type <c>ls</c> if you feel like it. I <e>really</e> want to go to sleep now.
</p>
<pre>
This is text output or code.
# <i>this is user input</i>
Make HTML/XML easier to read by using selective emphasis:
<foo><i>bar</i></foo>
<codenote>This is how to insert an inline note into the code block</codenote>
</pre>
<note>
This is a note.
</note>
<warn>
This is a warning.
</warn>
<impo>
This is important.
</impo>

Now, here's how this <body> element is rendered:

This is a paragraph. /etc/passwd is a file.
http://www.gentoo.org is my favorite website.
Type ls if you feel like it. I really want to go to sleep now.

This is text output or code.
# this is user input
Make HTML/XML easier to read by using selective emphasis:
<foo>bar</foo>
This is how to insert an inline note into the code block

This is a note.
This is a warning.
This is important.
The <body> tags

We introduced a lot of new tags in the previous section -- here's what you
need to know. The <p> (paragraph), <pre> (code
block), <note>, <warn> (warning) and
<impo> (important) tags all can contain one or more lines of text.
Besides the <table> element (which we'll cover in just a bit),
these are the only tags that should appear immediately inside a
<body> element. Another thing -- these tags should not be
stacked -- in other words, don't put a <note> element inside a
<p> element. As you might guess, the <pre> element
preserves its whitespace exactly, making it well-suited for code excerpts.
You can also name the <pre> tag:

The <path>, <c> and <e> elements can
be used inside any child <body> tag, except for
<pre>.

The <path> element is used to mark text that refers to an
on-disk file -- either an absolute or relative path, or a
simple filename. This element is generally rendered with a monospaced
font to offset it from the standard paragraph type.

The <c> element is used to mark up a command or user
input. Think of <c> as a way to alert the reader to something
that they can type in that will perform some kind of action. For example, all
the XML tags displayed in this document are enclosed in a <c>
element because they represent something that the user could type in that is
not a path. By using <c> elements, you'll help your readers
quickly identify commands that they need to type in. Also, because
<c> elements are already offset from regular text, it is rarely
necessary to surround user input with double-quotes. For example, don't
refer to a "<c>" element like I did in this sentence. Avoiding
the use of unnecessary double-quotes makes a document more readable -- and
adorable!

<e> is used to apply emphasis to a word or phrase; for example:
I really should use semicolons more often. As you can see, this text is
offset from the regular paragraph type for emphasis. This helps to give your
prose more punch!

<mail> and <uri>

We've taken a look at the <mail> tag earlier; it's used to link
some text with a particular email address, and takes the form <mail
link="foo@bar.com">Mr. Foo Bar</mail>.

The <uri> tag is used to point to files/locations on the
Internet. It has two forms -- the first can be used when you want to have the
actual URI displayed in the body text, such as this link to
http://www.gentoo.org. To create this link, I typed
<uri>http://www.gentoo.org</uri>. The alternate form is
when you want to associate a URI with some other text -- for example, the Gentoo Linux website. To create
this link, I typed <uri link="http://www.gentoo.org">the
Gentoo Linux website</uri>.

Figures

Here's how to insert a figure into a document -- <figure
link="mygfx.png" short="my picture" caption="my favorite picture of all
time"/>. The link= attribute points to the actual graphic image,
the short= attribute specifies a short description (currently used for
the image's HTML alt= attribute), and a caption. Not too difficult
:) We also support the standard HTML-style <img src="foo.gif"/> tag
for adding images without captions, borders, etc.

Tables and lists

Guide supports a simplified table syntax similar to that of HTML. To start
a table, use a <table> tag. Start a row with a <tr>
tag. However, for inserting actual table data, we don't support the
HTML <td> tag; instead, use the <th> if you are inserting a
header, and <ti> if you are inserting a normal informational
block. You can use a <th> anywhere you can use a <ti>
-- there's no requirement that <th> elements appear only in the
first row. Currently, these tags don't support any attributes, but some will
be added (such as a caption= attribute for <table>) soon.

To create ordered or unordered lists, simply use the HTML-style
<ol>, <ul> and <li> tags. List tags
should only appear inside a <p>, <ti>,
<note>, <warn> or <impo> tag.

Intra-document references

Guide makes it really easy to reference other parts of the document using
hyperlinks. You can create a link pointing to Chapter
One by typing <uri link="#doc_chap1">Chapter
One</uri>. To point to section two of
Chapter One, type <uri link="#doc_chap1_sect2">section two of
Chapter One</uri>. To refer to figure 3 in chapter 1, type <uri
link="doc_chap1_fig3">figure 1.3</uri>. Or, to refer to code listing 2 in chapter 2, type <uri
link="doc_chap2_pre2">code listing 2.2</uri>. We'll be
adding other auto-link abilities (such as table support) soon.

Coding StyleIntroduction

Since all Gentoo Documentation is a joint effort and several people will
most likely change existing documentation, a coding style is needed.
A coding style contains two sections. The first one is regarding
internal coding - how the xml-tags are placed. The second one is
regarding the content - how not to confuse the reader.

Word-wrapping must be applied at 80 characters except inside
<pre>. Only when there is no other choice can be deviated from
this rule (for instance when a URL exceeds the maximum amount of characters).
The editor must then wrap whenever the first whitespace occurs.

Indentation may not be used, except with the XML-constructs of which
the parent XML-tags are <tr> (from <table>),
<ul> and <ol>. If indentation is used, it
must be two spaces for each indentation. That means no tabs and
not more spaces.

In case word-wrapping happens in <ti>, <th> or
<li> constructs, indentation must be used for the content.

Inside tables (<table>) and listings (<ul> and
<ol>), periods (".") should not be used unless multiple
sentences are used. In that case, every sentence should end with a period (or
other reading marks).

Every sentence, including those inside tables and listings, should start
with a capital letter.

Try to use <uri> with the link attribute as much as
possible. In other words, the Gentoo
Website is preferred over http://www.gentoo.org.

When you comment something inside a <pre> construct, only use
<codenote> if the content is a C or C++ code snippet. Otherwise,
use <comment> and parantheses. Also place the comment before
the subject of the comment.

(Substitute "john" with your user name)
# id john

ResourcesStart writing

Guide has been specially designed to be "lean and mean" so that developers
can spend more time writing documentation and less time learning the actual XML
syntax. Hopefully, this will allow developers who aren't unusually "doc-savvy"
to start writing quality Gentoo Linux documentation. If you'd like to help (or
have any questions about guide), please post a message to the gentoo-doc mailing list stating what you'd
like to tackle. Have fun!