Excalibur

Excalibur

PDF Table Extraction for Humans

Star

About


Excalibur is a web interface to extract tabular data from PDFs. There are both open and closed-source tools that are widely used for this task. They either give a nice output or fail miserably. This is not helpful since everything in the real world, including PDF table extraction, is fuzzy. This leads to the creation of ad-hoc table extraction scripts for each type of PDF table.

Excalibur uses Camelot under the hood, which offers users complete control over table extraction. If you don't get the desired output with default settings, you can tweak them and get the job done!

You can check out fantastic documentation at https://excalibur-py.readthedocs.io/.

How-to


Upload PDF

You can upload your PDF using the web interface. You can also see previous uploads. All file storage and processing happens on your own local or remote machine, which means that you have complete control over your data.

Draw table areas and/or column separators

You can draw a table area and also add column separators in cases where the table is really buried inside the text on the PDF page.

Or auto-detect table areas

You won't need to draw table areas and column separators in most cases, because Excalibur can do that on its own.

Download in multiple formats

You can view the extracted tables and then download them in widely used tabular data formats. Excalibur supports CSV, Excel, JSON and HTML.

Developers


Vinayak Mehta

Vinayak Mehta

Nikhil Sikka

Nikhil Sikka