Sorry, if it is looks like an ad, but we created an Ambar: integrated ES + TIKA + PDFBOX + Tesseract. It can parse any file and search throught it. Also it have a nice web ui. It's available on github https://github.com/RD17/ambar