top of page

Document processing system for RAG and unstructured analytics.

Load your vector databases and hybrid search engines with higher-quality data. ETL for RAG.


Use Aryn to easily prepare and chunk your complex PDFs, HTML, presentations, transcripts, manuals, and more.

Aryn's engine, Sycamore, is 100% open source (Apache v2.0) and customizable.

Screenshot 2024-05-23 at 4.17.30 PM.png

Better data chunking means better answers from LLMs.
Process your unstructured data with Aryn. 


Higher quality chunking

Segment, label, enrich, and process PDFs, HTML, presentations and more using modern AI models and powerful data transforms. Tackle complex documents with tables, images, text, graphs, and infographics.

Use declarative dataflows

Write your processing job using DocSets, a declarative abstraction. It's like an Apache Spark DataFrame, but for collections of unstructured documents. With DocSets, reliable dataflow processing and observability are built in.

Reliably load
vector databases

Easily load vector databases and search engines, such as OpenSearch, Weaviate, Pinecone, ElasticSearch, and PostgreSQL (pgvector). Quickly load vectors, metadata, and keyword indexes for hybrid search. 

Open source and 
cloud native

100% open source (Apache License v2.0) with no lock-in. Highly customizable with your choice of AI models, prompts, and UDFs. Quickly and easily launch stacks in your VPC using Aryn Cloud or run with Docker.

Use cases.

Developers use Aryn Cloud to do ETL for RAG and unstructured analytics applications in financial services, healthcare, manufacturing, eCommerce, and customer support. 

Research and discovery

Prepare data for apps that enable analysts and researchers to ask hard questions on complex documents that include tables, infographics, and complicated layouts. Discover and use critical information that would otherwise be missed.

Reporting on unstructured data feeds

Create structured reports from unstructured data to answer key business questions. Run scheduled pipelines that extract, enrich, and store information from diverse datasets, such as Salesforce data, health records, or contracts.

Technical knowledge bases

Empower technical knowledge workers with AI-assistants by processing manuals, technical documents, installation guides, and catalogs for RAG systems. Answer technical questions and find information from properly chunked data.

Customer support

Deliver high-quality data to co-pilots to empower customer support teams, healthcare professionals, or empower customers to directly query knowledge bases, support tickets, FAQs, healthcare records, and other info sources. 

Meet Aryn.

Aryn (pronounced "air-in") means "high mountain." Over 90% of data today is unstructured, leaving enterprises with mountains of data that are difficult to conquer. Our goal is to help you summit that peak.

Leadership Team.

Aryn's team includes leaders with decades of big data, AI, and cloud experience from AWS, Google Cloud, Stripe, Dremio, HP, IBM, Yahoo!, and Meta.

Sign up for Aryn Cloud

Receive an invite to Aryn Cloud by sharing the information below.

Thanks for signing up!


Aryn has a $7.5M seed investment from Factory HQ, 8VC,
Lip-Bu Tan, Amarjit Gill, and other notable angels and advisors. Read 8VC's investment announcement.

bottom of page