tm - Text Mining Package

The tm package offers functionality for managing text
documents, abstracts the process of document manipulation
and eases the usage of heterogeneous text formats in
R. The package has integrated database back-end support to
minimize memory demands. An advanced meta data management
is implemented for collections of text documents to
alleviate the usage of large and with meta data enriched
document sets.

The package provides native support for reading in several classic file
formats (e.g. plain text, PDFs, or
XML files). There is also a plug-in mechanism to
handle additional file formats.

The data structures and algorithms can be extended to fit
custom demands, since the package is designed in a modular
way to enable easy integration of new file formats,
readers, transformations and filter operations.

tm provides easy access to preprocessing and manipulation mechanisms
such as whitespace removal, stemming, or stopword deletion. Further a
generic filter architecture is available in order to filter documents
for certain criteria, or perform full text search. The package supports
the export from document collections to term-document matrices.