What’s New in DocParse: Vision Pipelines and Voting

Ben Sowell, CTO

February 24, 2026

Today we are happy to announce two new preview features in DocParse: VLM-based vision pipelines for document parsing, and LLM voting for agentic property extraction. Both of these features are available at no additional charge during the preview period.

Vision Pipelines

At Aryn, we have always seen our value as combining the best available models for working with documents. When we started, that meant building our own models for segmentation and table extraction. We have continued to invest in these models, and they remain excellent options for balancing accuracy, cost, and performance. At the same time, we have seen Vision language models, or VLMs, become increasingly capable at document parsing tasks. We have offered support for VLMs for portions of the parsing pipeline, such as OCR or structured table extraction, for a while, but only recently have these models become capable and performant enough to replace the entire pipeline.

Today we are happy to announce a new option in DocParse that lets you replace the entire parsing pipeline with a VLM. We are currently using  the PaddleOCR-VL 1.5 model to parse the document and convert it to our DocParse JSON format. In our testing, PaddleOCR-VL performed particularly well on complex documents containing tables, figures, and formulas, and it tops the OmniDocBench leaderboard on end-to-end document parsing.

To use the new pipeline, set the pipeline to “Vision” in the DocParse Playground. You can find information about how to configure the pipeline in our API in our documentation.

Voting for Agentic Property Extraction

We launched Agentic Property Extraction in December of last year to enable you to extract structured information from your unstructured documents. Since then, we’ve had the opportunity to work with a variety of customers in several domains, like insurance and finance. The feedback we’ve gotten has been remarkably consistent: the most important thing for customers is quality, and it’s critical to build mechanisms to verify the accuracy of the output. Element-level attribution is one tool we provide for verification, and today we are launching another in voting.

While careful context engineering can get you a long way in improving accuracy with LLMs, they can still hallucinate or make mistakes. We can help mitigate this risk by performing extraction with LLMs from different providers and having them vote on the correct answer. If multiple LLMs agree, there is a good chance the value is correct. This helps reduce errors, and it’s also a good way to identify properties that may be underspecified in the schema.

To enabling voting, simply select the “Enabling Voting” toggle in the UI under “Property Extraction Options”.

Once processing is completed, you will see the vote count for each extracted property. We automatically select the value with the highest vote, but display all the guesses for informational purposes. You can use these to refine your schema in order to improve extraction. In cases where there is no agreement, we select the value you would have gotten had you not selected voting.

Since voting calls multiple LLMs, it will take more time than extraction with a single LLM. The calls are made in parallel, but you will need to wait for the slowest LLM to complete.

We are excited to share these features and we would love your feedback!