Excalibur

Excalibur

PDF Table Extraction for Humans

Star

Do you want us to build a new feature? Just holler!


About


Extracting tables from PDFs is hard

The Portable Document Format (PDF) was not designed for tabular data. Sadly, a lot of open data is shared as PDFs and getting tables out for analysis and record-keeping is a pain. Excalibur makes PDF table extraction very easy. You can download the extracted tables as CSVs or an Excel spreadsheet. All data remains on your machine.

Why another tool?

There are both open and closed-source tools that are widely used for PDF table extraction. They either give a nice output or fail miserably. Excalibur is powered by Camelot (written by one of the authors) which gives users complete control over table extraction. If you don't get the desired output with default settings, you can tweak them and get the job done!

Automate your workflow

Excalibur can detected tables in your PDFs automatically. For cases where it doesn't, you can tweak table extraction settings, save them as presets and then apply them on different PDFs with similar table structures. After v0.5.0, Excalibur will have a web API which can be used to start table extraction jobs and download extracted tables when jobs finish.

Built for scale

Excalibur can be configured with MySQL and Celery to execute table extraction jobs in a parallel and distributed manner. By default, jobs are executed sequentially. You can check out the documentation at https://excalibur-py.readthedocs.io for more details.

Usage


Upload your PDF

You can upload your PDF using the web interface. You can also see previous uploads. All file storage and processing happens on your own local or remote machine, which means that you have complete control over your data.

Auto-detect table areas

You don't need to draw table areas and column separators in most cases, because Excalibur can do that automatically.

Or draw table areas and/or column separators

You can draw table areas and also add column separators in cases where the tables are buried deep inside the text on the page.

Or load a saved table extraction rule

Each new table extraction rule (table areas, column separators and other settings) is saved by default. You can load it next time you see a PDF with a similar table structure.

Download extracted tables in structured formats

You can view the extracted tables and then download them as CSVs or an Excel spreadsheet. Excalibur also supports JSON and HTML.

Download


Available for Windows, macOS and Linux

Excalibur can be easily installed using pip which is a package manager for Python packages.

Or you can just download the executable and run it directly!

Download Now!

Team


Vinayak Mehta

Vinayak Mehta

Nikhil Sikka

Nikhil Sikka