a_parse_by_mimetype : select the parser by the mime type of the document
(pulled out of the HTTP header). When the mime type is set to "text/html"
the HTML parser (parsec or tagsoup) is taken, when it's set to
"text/xml" or "text/xhtml" the XML parser (parsec or tagsoup) is taken.
If the mime type is something else no further processing is performed,
the contents is given back to the application in form of a single text node.
If the default document encoding (a_encoding) is set to isoLatin1, this even enables processing
of arbitray binary data.

a_ignore_none_xml_contents: ignore document contents of none XML/HTML documents.
This option can be useful for implementing crawler like applications, e.g. an URL checker.
In those cases net traffic can be reduced.

a_strict_input : file input is done strictly using the Data.ByteString input functions. This ensures correct closing of files, especially when working with
the tagsoup parser and not processing the whole input data. Default is off. The ByteString input usually is not faster than the buildin hGetContents
for strings.

a_options_curl : deprecated but for compatibility reasons still supported.
More options passed to the curl binding.
Instead of using this option to set a whole bunch of options at once for curl
it is recomended to use the curl-.* options syntax described below.

a_mime_types : set the mime type table for file input with given file. The format of this config file must be in the syntax of a debian linux "mime.types" config file

a_if_modified_since : read document conditionally, only if the document is newer than the given date and time argument, the contents is delivered,
else just the root node with the meta data is returned. The date and time must be given in System.Locale.rfc822DateFormat.

curl options : the HTTP interface with libcurl can be configured with a lot of options. To support these options in an easy way, there is a naming convetion:
Every option, which has the prefix curl and the rest of the name forms an option as described in the curl man page, is passed to the curl binding lib.
See Text.XML.HXT.IO.GetHTTPLibCurl.getCont for examples. Currently most of the options concerning HTTP requests are implemented.

All attributes not evaluated by readDocument are stored in the created document root node for easy access of the various
options in e.g. the input/output modules

If the document name is the empty string or an uri of the form "stdin:", the document is read from standard input.

examples:

readDocument [ ] "test.xml"

reads and validates a document "test.xml", no namespace propagation, only canonicalization is performed

reads an SVG document from standard input, sets the mime type by looking in the system mimetype config file, default encoding is isoLatin1,
parsing is done with the lightweight tagsoup parser, which implies no validation.