Class Tika

Facade class for accessing Tika functionality. This class hides much of
the underlying complexity of the lower level Tika classes and provides
simple methods for many common parsing and type detection operations.

Method Detail

detect

Detects the media type of the given document. The type detection is
based on the content of the given document stream and any given
document metadata. The document stream can be null,
in which case only the given document metadata is used for type
detection.

If the document stream supports the
mark feature, then the stream is
marked and reset to the original position before this method returns.
Only a limited number of bytes are read from the stream.

detect

Detects the media type of the given document. The type detection is
based on the first few bytes of a document and the document name.

For best results at least a few kilobytes of the document data
are needed. See also the other detect() methods for better
alternatives when you have more than just the document prefix
available for type detection.

Parameters:

prefix - first few bytes of the document

name - document name

Returns:

detected media type

Since:

Apache Tika 0.9

detect

Detects the media type of the given document. The type detection is
based on the first few bytes of a document.

For best results at least a few kilobytes of the document data
are needed. See also the other detect() methods for better
alternatives when you have more than just the document prefix
available for type detection.

parse

Parses the given document and returns the extracted text content.
Input metadata like a file name or a content type hint can be passed
in the given metadata instance. Metadata information extracted from
the document is returned in that same metadata instance.

The returned reader will be responsible for closing the given stream.
The stream and any associated resources will be closed at or before
the time when the Reader.close() method is called.

parseToString

Parses the given document and returns the extracted text content.
The given input stream is closed by this method.

To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength() first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.

NOTE: Unlike most other Tika methods that take an
InputStream, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.

parseToString

Parses the given document and returns the extracted text content.
The given input stream is closed by this method. This method lets
you control the maxStringLength per call.

To avoid unpredictable excess memory use, the returned string contains
only up to maxLength (parameter) first characters extracted
from the input document.

NOTE: Unlike most other Tika methods that take an
InputStream, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.

parseToString

Parses the given document and returns the extracted text content.
The given input stream is closed by this method.

To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength() first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.

NOTE: Unlike most other Tika methods that take an
InputStream, this method will close the given stream for
you as a convenience. With other methods you are still responsible
for closing the stream or a wrapper instance returned by Tika.

parseToString

Parses the file at the given path and returns the extracted text content.

To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength() first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.

parseToString

To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength() first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.

parseToString

Parses the resource at the given URL and returns the extracted
text content.

To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength() first characters extracted
from the input document. Use the setMaxStringLength(int)
method to adjust this limitation.