mnoGoSearch understands
text/plain, text/html
and text/xml Mime types out of the box,
and is able to index files of these types using the build-in parsers.
For the other file types it can use external parsers.

An external parser is an executable program which can convert
a file of some Mime type to the one of the types
supported by mnoGoSearch.
For example, if you want mnoGoSearch
to index PostScript files , you can
do it with help of the ps2ascii parser,
which reads a PostScript file from
STDIN and produces plain text output to STDOUT.

Note: External parsers are also often referenced
in this manual as filters, or converters.

Some parsers can not operate on STDIN and require a file to read from.
In this case indexer can create a
temporary file in /tmp and remove the file when the parser is done.
Use the $1
macro in the parser command line to substitute the temporary file name. For example,
the Mime command for the catdocMS-Word-to-text converter can look like this:

Mime application/msword text/plain "/usr/bin/catdoc -a $1"

If your parser writes the result
into an output file, use the $2 macro.
indexer will replace $2
with the output temporary file name, then start the parser,
read the result from this temporary file and delete the file. For example:

Mime application/msword text/plain "/usr/bin/catdoc -a $1 >$2"

The parser above will read data
from the first temporary file and write results to the second file. Both
temporary files will be deleted after reading parser results. Note that
this command is effectively the same with the previous example. They
only differ in the execution method used by indexer:
file-to-STDOUT versus file-to-file.

To prevent indexer from getting stuck on a parser execution
you can specify the amount of time (in seconds) indexer
waits for an external parser to return results.
Use the ParserTimeOutindexer.conf for this purpose. For example:

ParserTimeOut 600

The default value is 300 seconds (5 minutes).
If an external parser does not return results within this period of time,
indexer will kill the parser process, remove the
associated temporary files and continue with the next document in
the queue.

Some parsers can produce output in a character sets
different from the one given in the
LocalCharset command.
You can specify the output character set in a parser command line
to make indexer convert the parser output to
LocalCharset.
For example, if catdoc is configured
to produce output in windows-1251 character
set but LocalCharset is set to koi8-r,
you can use this command for parsing MS Word documents:

3. RPM: mysql 3.20.32a-3 (Applications/Databases) [4]
Mysql is a SQL (Structured Query Language) database server.
Mysql was written by Michael (Monty) Widenius. See the CREDITS
file in the distribution for more credits for mysql and related
things....
(application/x-rpm) 2088855 bytes

If you're using an external parser not listed here,
please contribute your parser configuration
to <general@mnogosearch.org>.