This package provides commands to parse text written in the
docidx markup language and convert it into the canonical
serialization of the keyword index encoded in the text.
See the section Keyword index serialization format for
specification of their format.

This is an internal package of doctools, for use by the higher level
packages handling docidx documents.

The command takes the string contained in text and parses it
under the assumption that it contains a document written using the
docidx markup language. An error is thrown if this assumption
is found to be false. The format of these errors is described in
section Parse errors.

When successful the command returns the canonical serialization of the
keyword index which was encoded in the text.
See the section Keyword index serialization format for
specification of that format.

This method adds the path to the list of paths searched when
looking for an include file. The call is ignored if the path is
already in the list of paths. The method returns the empty string as
its result.

This method removes the path from the list of paths searched
when looking for an include file. The call is ignored if the path is
not contained in the list of paths. The method returns the empty
string as its result.

This method adds the variable name to the set of predefined
variables known to the vset markup command during processing,
and gives it the specified value. The method returns the empty
string as its result.

The format of the parse error messages thrown when encountering
violations of the docidx markup syntax is human readable and
not intended for processing by machines. As such it is not documented.

However, the errorCode attached to the message is
machine-readable and has the following format:

The error code will be a list, each element describing a single error
found in the input. The list has at least one element, possibly more.

Each error element will be a list containing six strings describing an
error in detail. The strings will be

The path of the file the error occurred in. This may be empty.

The range of the token the error was found at. This range is a
two-element list containing the offset of the first and last character
in the range, counted from the beginning of the input (file). Offsets
are counted from zero.

The line the first character after the error is on.
Lines are counted from one.

The column the first character after the error is at.
Columns are counted from zero.

The message code of the error. This value can be used as argument to
msgcat::mc to obtain a localized error message, assuming that
the application had a suitable call of doctools::msgcat::init to
initialize the necessary message catalogs (See package
doctools::msgcat).

A list of details for the error, like the markup command involved. In
the case of message code docidx/include/syntax this value is
the set of errors found in the included file, using the format
described here.

Here we specify the format used by the doctools v2 packages to
serialize keyword indices as immutable values for transport,
comparison, etc.

We distinguish between regular and canonical
serializations. While a keyword index may have more than one regular
serialization only exactly one of them will be canonical.

regular serialization

An index serialization is a nested Tcl dictionary.

This dictionary holds a single key, doctools::idx, and its
value. This value holds the contents of the index.

The contents of the index are a Tcl dictionary holding the title of
the index, a label, and the keywords and references. The relevant keys
and their values are

title

The value is a string containing the title of the index.

label

The value is a string containing a label for the index.

keywords

The value is a Tcl dictionary, using the keywords known to the index
as keys. The associated values are lists containing the identifiers of
the references associated with that particular keyword.

Any reference identifier used in these lists has to exist as a key in
the references dictionary, see the next item for its
definition.

references

The value is a Tcl dictionary, using the identifiers for the
references known to the index as keys. The associated values are
2-element lists containing the type and label of the reference, in
this order.

Any key here has to be associated with at least one keyword,
i.e. occur in at least one of the reference lists which are the values
in the keywords dictionary, see previous item for its
definition.

The identifier of the reference is interpreted as symbolic file name,
referring to one of the documents the index was made for.

url

The identifier of the reference is interpreted as an url, referring to
some external location, like a website, etc.

canonical serialization

The canonical serialization of a keyword index has the format as
specified in the previous item, and then additionally satisfies the
constraints below, which make it unique among all the possible
serializations of the keyword index.

The keys found in all the nested Tcl dictionaries are sorted in
ascending dictionary order, as generated by Tcl's builtin command
lsort -increasing -dict.

The references listed for each keyword of the index, if any, are
listed in ascending dictionary order of their labels, as
generated by Tcl's builtin command lsort -increasing -dict.

This document, and the package it describes, will undoubtedly contain
bugs and other problems.
Please report such in the category doctools of the
Tcllib Trackers.
Please also report any ideas for enhancements you may have for either
package and/or documentation.

Note further that attachments are strongly preferred over
inlined patches. Attachments can be made by going to the Edit
form of the ticket immediately after its creation, and then using the
left-most button in the secondary navigation bar.