docling-project/docling-models

transformerstransformersarxiv:2408.09869arxiv:2206.01062doi:10.57967/hf/3036license:cdla-permissive-2.0license:apache-2.0cdla-permissive-2.0apache-2.0
738.1K

Docling Models

This page contains models that power the PDF document converion package docling.

Layout Model

The layout model will take an image from a page and apply RT-DETR model in order to find different layout components. It currently detects the labels: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title. As a reference (from the DocLayNet-paper), this is the performance of standard object detection methods on the DocLayNet dataset compared to human evaluation,

humanMRCNNMRCNNFRCNNYOLO
humanR50R101R101v5x6
Caption84-8968.471.570.177.7
Footnote83-9170.971.873.777.2
Formula83-8560.163.463.566.2
List-item87-8881.280.881.086.2
Page-footer93-9461.659.358.961.1
Page-header85-8971.970.072.067.9
Picture69-7171.772.772.077.1
Section-header83-8467.669.368.474.6
Table77-8182.282.982.286.3
Text84-8684.685.885.488.1
Title60-7276.780.479.982.7
All82-8372.473.573.476.8

TableFormer

The tableformer model will identify the structure of the table, starting from an image of a table. It uses the predicted table regions of the layout model to identify the tables. Tableformer has SOTA table structure identification,

Model (TEDS)Simple tableComplex tableAll tables
Tabula78.057.867.9
Traprange60.849.955.4
Camelot80.066.073.0
Acrobat Pro68.961.865.3
EDD91.285.488.3
TableFormer95.490.193.6

References

@techreport{Docling,
  author = {Deep Search Team},
  month = {8},
  title = {{Docling Technical Report}},
  url={https://arxiv.org/abs/2408.09869},
  eprint={2408.09869},
  doi = "10.48550/arXiv.2408.09869",
  version = {1.0.0},
  year = {2024}
}


```bibtex
@article{doclaynet2022,
  title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis}

,
doi = {10.1145/3534678.353904}, url = {https://arxiv.org/abs/2206.01062}, author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J}, year = {2022} }

@InProceedings{TableFormer2022, author = {Nassar, Ahmed and Livathinos, Nikolaos and Lysak, Maksym and Staar, Peter}, title = {TableFormer: Table Structure Understanding With Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {4614-4623}, doi = {https://doi.org/10.1109/CVPR52688.2022.00457} }

DEPLOY IN 60 SECONDS

Run docling-models on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.