The Portable Document Format (PDF) was not designed for tabular data. Sadly, a lot of open data is shared as PDFs and getting tables out for analysis and record-keeping is a pain. Excalibur makes PDF table extraction very easy. You can download the extracted tables as CSVs or an Excel spreadsheet. All data remains on your machine.
There are both open and closed-source tools that are widely used for PDF table extraction. They either give a nice output or fail miserably. Excalibur is powered by Camelot (written by one of the authors) which gives users complete control over table extraction. If you don't get the desired output with default settings, you can tweak them and get the job done!
Excalibur can detected tables in your PDFs automatically. For cases where it doesn't, you can tweak table extraction settings, save them as presets and then apply them on different PDFs with similar table structures. After v0.5.0, Excalibur will have a web API which can be used to start table extraction jobs and download extracted tables when jobs finish.
Excalibur can be configured with MySQL and Celery to execute table extraction jobs in a parallel and distributed manner. By default, jobs are executed sequentially. You can check out the documentation at https://excalibur-py.readthedocs.io for more details.
You can upload your PDF using the web interface. You can also see previous uploads. All file storage and processing happens on your own local or remote machine, which means that you have complete control over your data.
You don't need to draw table areas and column separators in most cases, because Excalibur can do that automatically.
You can draw table areas and also add column separators in cases where the tables are buried deep inside the text on the page.
Each new table extraction rule (table areas, column separators and other settings) is saved by default. You can load it next time you see a PDF with a similar table structure.
Excalibur can be easily installed using pip which is a package manager for Python packages.
Or you can just download the executable and run it directly!
Download Now!Copyright © Camelot Dev 2018
Made with in New Delhi, India