Textricator is a tool for extracting text from PDFs and generating structured data (CSV or JSON). It can even work on OCR'ed documents. Describe what the document's contents look like with a YAML file and it'll extract the data using those fields. Can also be used as a Java library.
3745 links, including 199 private