High quality AI-powered document parsing and data extraction

Accurate document layout parsing, table and image extraction, OCR, and more
for document intelligence, ingestion for LLM-based apps, and RAG frameworks

Up to 6x more accurate and 5x cheaper

Aryn's document parsing (DocParse) runs a compound deep learning AI model trained on 80k+ enterprise documents along with powerful post-processing steps. It's up to 6x more accurate and 5x cheaper than alternative systems, and has JSON or markdown output.

Check Icon

Supports over 30+ file formats including PDF and Microsoft Office

Check Icon

Document layout parsing with labeled bounding boxes by type (e.g. header, text, table...)

Check Icon

Scales to documents with thousands of pages

Check Icon

Supports OCR in 60+ languages

feature-image

Tame your tables and data extraction

Complex tables with odd layouts, spanning rows, and lots of text? Trying to extract data from your documents? DocParse can handle it! It has best-in-class compound table extraction and LLM-powered data extraction to pull accurate information from documents.

Complex Tables

Preserve complex
table formatting

Data Extraction

Leverage GenAI to
analyze docs

feature-image

Easily integrate with only a few lines of code

Easily add DocParse to your document processing workflows with a few lines of code using the Aryn SDK. Or, use the Playground UI to visually inspect parsing and extraction.

Check Icon

Use sync or async APIs with the Aryn SDK

Check Icon

Use DocParse Playground UI to easily visualize parsing and extraction

Check Icon

Support for open source Sycamore document ETL library

Check Icon

Available as SaaS, private cloud, or on-prem deployment

feature-image