Detectors Missing

For most detectors, that means the tika-parsers jar and dependencies (the container detectors are generally stored along with the parsers)

One of:

a Tika Config which explicitly lists the detector class

a Tika Config (eg default one) which uses DefaultDetectorand a service file for the detector

To check what detectors you have, see Identifying what Detectors your Tika install supports

To check if any detectors were defined but failed to load see Identifying if any Detectors failed to be loaded

Mime Type Missing

If Tika doesn't out of the box, you need to add a custom mimetypes file. See the quick guide for how

If you have written a custom mimetypes file, it needs to be present on your classpath at runtime with the exact name of org/apache/tika/mime/custom-mimetypes.xml . Double check you added it to your classpath, it has exactly that name (no typos, no prefix directories, no suffixes etc), and use Identifying what Mime Types your Tika install supports to see if you've loaded it or not

Identifying your Tika Version

Tika App

java -jar tika-app-blah.jar --version

Tika Server

Go to http://localhost:9998/version

Tika Facade

// Get your Tika object, eg
Tika tika = new Tika();
// Call toString() to get the version
String version = tika.toString();

Identifying if any Parsers failed to be loaded

When staring your JVM, if you pass in -Dorg.apache.tika.service.error.warn=true then you'll get warnings logged if any Parsers or Detectors couldn't be loaded. With the default logging configuration, you'll see things like this printed to your standard output of the JVM:

WARNING: Unable to load org.apache.tika.parser.microsoft.OfficeParser
java.lang.NoClassDefFoundError: org/apache/poi/poifs/filesystem/DirectoryEntry
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2585)
at java.lang.Class.getConstructor0(Class.java:2885)
at java.lang.Class.newInstance(Class.java:350)
at org.apache.tika.config.ServiceLoader.loadStaticServiceProviders(ServiceLoader.java:315)
at org.apache.tika.parser.DefaultParser.getDefaultParsers(DefaultParser.java:52)
at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:61)
at org.apache.tika.parser.DefaultParser.<init>(DefaultParser.java:66)
at org.apache.tika.config.TikaConfig.getDefaultParser(TikaConfig.java:76)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:182)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:291)
at org.apache.tika.Tika.<init>(Tika.java:115)
at org.apache.tika.cli.TikaCLI.version(TikaCLI.java:629)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:365)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134)
Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.filesystem.DirectoryEntry
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 15 more

In this case, the error is telling us that we're missing the Apache POI jars which are a required dependency of Tika Parsers, and of the org.apache.tika.parser.microsoft.OfficeParser parser.

Identifying if any Detectors failed to be loaded

When staring your JVM, if you pass in -Dorg.apache.tika.service.error.warn=true then you'll get warnings logged if any Parsers or Detectors couldn't be loaded. With the default logging configuration, you'll see things like this printed to your standard output of the JVM:

WARNING: Unable to load org.apache.tika.parser.microsoft.POIFSContainerDetector
java.lang.NoClassDefFoundError: org/apache/poi/poifs/filesystem/DirectoryEntry
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2585)
at java.lang.Class.getConstructor0(Class.java:2885)
at java.lang.Class.newInstance(Class.java:350)
at org.apache.tika.config.ServiceLoader.loadStaticServiceProviders(ServiceLoader.java:315)
at org.apache.tika.detect.DefaultDetector.getDefaultDetectors(DefaultDetector.java:55)
at org.apache.tika.detect.DefaultDetector.<init>(DefaultDetector.java:66)
at org.apache.tika.config.TikaConfig.getDefaultDetector(TikaConfig.java:71)
at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:183)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:291)
at org.apache.tika.Tika.<init>(Tika.java:115)
at org.apache.tika.cli.TikaCLI.version(TikaCLI.java:629)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:365)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:134)
Caused by: java.lang.ClassNotFoundException: org.apache.poi.poifs.filesystem.DirectoryEntry
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 14 more

In this case, the error is telling us that we're missing the Apache POI jars which are a required dependency of Tika Parsers, and of the org.apache.tika.parser.microsoft.POIFSContainerDetector detector.

If that shows the same problem, it's a PDFBox bug. Please file an Apache PDFBox bug report and attach at least one failing file to the bug. When that gets fixed, Tika will pick up the new release and will get the fix

If PDFBox ExtractText works fine, it may* be a Tika bug. Please report an Apache Tika bug, attach at least one failing file, and mention that PDFBox ExtractText doesn't have the issue.

*PDFBox's ExtractText does not pull text from Annotations or Acroforms, so it is possible that a problem not encountered by PDFBox's ExtractText reveals a bug in Annotations or Acroforms; might be a bug in Tika, too. When in doubt, ask.