TERMite BATch

The BATch module is a TERMite add-on component that facilitates very large-scale analysis of documents by the TERMite engine.

BATch provides command-line access to run large TERMite jobs in parallel on multi-core CPUs.

The main use-case for BATch is the processing of millions of documents such as the entire Medline database, or large numbers of patent or internal documents. For instance, on a standard 4-core CPU PC, the entire Medline database can be annotated with over 20 key life science dictionaries in under 5 hours.

BATch works across file systems such as Hadoop, enabling very large scale document processing.