Up to 6x more accurate, 5x faster, 5x cheaper
1. Use Aryn DocParse to easily chunk and extract data from your documents into structured JSON.
2. Take the JSON output and run additional ETL steps to load your vector database with Aryn DocPrep.
Try your own doc in the Aryn Playground
Chunk, embed, and load your data
Aryn
DocParse
Doc to JSON
DocPrep
ETL for Docs
JSON
{...}
Why Aryn?
Higher quality chunking
Aryn DocParse is up to 6x more accurate and 5x faster than alternatives. Structure and extract data from PDFs, HTML, presentations and more using purpose-built AI models. Tackle complex documents with tables, images, text, graphs, and infographics.
Use declarative dataflows
Aryn DocPrep generates ETL pipeline code for processing and loading your unstructured data into your vector databases. Choose from variety of chunking strategies and vector embedding models. Customize your pipeline code with data extraction transforms and more.
Reliably load
vector databases
Easily load vector databases and hybrid search engines using Aryn DocPrep's connectors, such as Pinecone, OpenSearch, Weaviate, Elasticsearch, Qdrant, and DuckDB. DocPrep's generated ETL pipelines can scale from processing one to thousands of documents.
Open source and
cloud native
Aryn DocParse's base AI model is open source and is available on Hugging Face. Aryn DocPrep generates ETL pipelines using the Sycamore document ETL library, which is 100% open source (Apache License v2.0). It's customizable with data transforms and UDFs.
Use cases
Developers use Aryn in financial services, healthcare, manufacturing, eCommerce, and customer support.
Research and discovery
Prepare data for apps that enable analysts and researchers to ask hard questions on complex documents that include tables, infographics, and complicated layouts. Discover and use critical information that would otherwise be missed.
Reporting on unstructured data feeds
Create structured reports from unstructured data to answer key business questions. Run scheduled pipelines that extract, enrich, and store information from diverse datasets, such as Salesforce data, health records, or contracts.
Technical knowledge bases
Empower technical knowledge workers with AI-assistants by processing manuals, technical documents, installation guides, and catalogs for RAG systems. Answer technical questions and find information from properly chunked data.
Customer support
Deliver high-quality data to co-pilots to empower customer support teams, healthcare professionals, or empower customers to directly query knowledge bases, support tickets, FAQs, healthcare records, and other info sources.
Installation
Installing the SDK for Aryn DocParse and the Sycamore library is quick and simple. Learn more