jacob-ogre/pdftext: Extract Text from Text- and Image-based PDFs

Version 0.2.1

A large batch of PDFs may contain a mix of text-based and image-
based PDFs, and one needs to extract the text from all of these files for
analysis. This package offers a single primary function to perform text
extraction from PDFs by trying the poppler library's wrapper in pdftools;
if that fails, then Imagemagick, unpaper, and Tesseract are used to perform
Optical Character Recognition.