extractText(InputStream stream,
String type,
String encoding)
Returns a reader for the text content of the given binary document.
Returns an empty reader if an error occured extracting text from
the word document.

MsWordTextExtractor

extractText

Returns a reader for the text content of the given binary document.
The content type and character encoding (if available and applicable)
are given as arguments. The given content type is guaranteed to be
one of the types reported by TextExtractor.getContentTypes() unless the
implementation explicitly permits other content types.

The implementation can choose either to read and parse the given
document immediately or to return a reader that does it incrementally.
The only constraint is that the implementation must close the given
stream latest when the returned reader is closed. The caller on the
other hand is responsible for closing the returned reader.

The implemenation should only throw an exception on transient
errors, i.e. when it can expect to be able to successfully extract
the text content of the same binary at another time. An effort
should be made to recover from syntax errors and other similar problems.

This method should be thread-safe, i.e. it is possible that this
method is invoked simultaneously by different threads to extract the
text content of different documents. On the other hand the returned
reader does not need to be thread-safe.
Returns an empty reader if an error occured extracting text from
the word document.

Parameters:

stream - binary document from which to extract text

type - MIME type of the given document, lower case

encoding - the character encoding of the binary data,
or null if not available