- 14 Apr 2025
- 4 Minutes to read
- Print
- DarkLight
- PDF
Agentic Document Extraction
- Updated on 14 Apr 2025
- 4 Minutes to read
- Print
- DarkLight
- PDF
The Agentic Document Extraction API is a tool that can extract structured information out of documents with different layouts. It returns the extracted data in a structured hierarchical format containing text, tables, pictures, charts, and other information.
You can send documents to the Agentic Document Extraction API through our web-based app here. You can also programmatically send documents using our agentic-doc library.
The Agentic Document Extraction API is built on VisionAgent, the agentic framework from LandingAI.

Agentic Document Extraction Library
Extend the functionality of the Agentic Document Extraction tool with our agentic-doc library. This Python library wraps around the Agentic Document Extraction API to add more features and support to the document extraction process. For example, using this library allows you to process much longer documents.
Use the library here.
File Support
The Agentic Document Extraction tool supports these files:
- PNG, JPEG, PDF (up to 2 pages)
- Max size: 50MB per file
If you need support for other file types or larger files, use our our agentic-doc library.
Rate Limits and Pricing
Each usage tier has rate limits. To change your pricing plan, go to the Pricing page.
For higher usage, additional features, and custom solutions, please schedule a call to discuss our enterprise plans.
If you exceed the rate limit, you will get a 429
(Too Many Requests) error response.
Send Files in the Document Extraction App
To send files in the web-based Agentic Document Extraction app:
- Go to the Agentic Document Extraction app.
- Upload a file or click one of the Examples.
Upload or Select a File
- The Agentic Document Extraction app processes the file. This might take a few moments.
- After the app processes the file, it displays the results:
- A bounding box displays around each element on the document preview.
- The API response displays in the right-hand panel. You can toggle between the JSON and Markdown output.
Extracted Data
- Use the Chat with Document tool in the right-hand panel to interact with the document.
- Happy with the output? Try sending the files programmatically with our agentic-doc library! Share your projects in our developer-focused Discord server.
Chat with Document
After the Agentic Document Extraction app processes the file, you can use the Chat with Document tool in the right-hand panel to interact with the document.
The Chat with Document tool is an LLM layered on top of the Agentic Document Extraction API. The chat tool showcases how the API accurately extracts and understands document data, including element locations. Use the chat tool to get inspired for how you can build custom solutions on top of the API.
The Chat with Document tool suggests a few prompts based on your document. You can also enter your own prompts.

JSON Schema
The Agentic Document Extraction tool returns extracted data using the following JSON schema.
{
"$defs": {
"Chunk": {
"description": "An extracted chunk from the document",
"properties": {
"text": {
"description": "A Markdown representation of the chunk (except for tables, which are represented in HTML).",
"title": "Text",
"type": "string"
},
"grounding": {
"description": "The specific spatial location(s) of this chunk within the original document. A chunk can have multiple groundings, for example if it is single paragraph split across two columns.",
"items": {
"$ref": "#/$defs/ChunkGrounding"
},
"title": "Grounding",
"type": "array"
},
"chunk_type": {
"$ref": "#/$defs/ChunkType",
"description": "The detected type of the chunk, matching its role within the document."
},
"chunk_id": {
"description": "A UUID for the chunk. This matches UUIDs in the HTML comments in the Markdown output.",
"title": "Chunk Id",
"type": "string"
}
},
"required": [
"text",
"grounding",
"chunk_type",
"chunk_id"
],
"title": "Chunk",
"type": "object"
},
"ChunkGrounding": {
"description": "Grounding for a chunk, specifying the location within the original document",
"properties": {
"box": {
"$ref": "#/$defs/ChunkGroundingBox",
"description": "A bounding box (in relative coordinates) establishing the chunk's spatial location within the page."
},
"page": {
"description": "The chunk's 0-indexed page within the original document.",
"title": "Page",
"type": "integer"
}
},
"required": [
"box",
"page"
],
"title": "ChunkGrounding",
"type": "object"
},
"ChunkGroundingBox": {
"description": "Bounding box, expressed in relative coordinates (float from 0 to 1)",
"properties": {
"l": {
"title": "L",
"type": "number"
},
"t": {
"title": "T",
"type": "number"
},
"r": {
"title": "R",
"type": "number"
},
"b": {
"title": "B",
"type": "number"
}
},
"required": [
"l",
"t",
"r",
"b"
],
"title": "ChunkGroundingBox",
"type": "object"
},
"ChunkType": {
"description": "Type of the chunk, signifying its role within the document",
"enum": [
"title",
"page_header",
"page_footer",
"page_number",
"key_value",
"form",
"table",
"figure",
"text"
],
"title": "ChunkType",
"type": "string"
}
},
"properties": {
"markdown": {
"description": "A Markdown representation of the document, potentially with HTML comments at the end of the each chunk. You can use this as context to an LLM.",
"title": "Markdown",
"type": "string"
},
"chunks": {
"description": "List of chunks extracted from the document in reading order.",
"items": {
"$ref": "#/$defs/Chunk"
},
"title": "Chunks",
"type": "array"
}
},
"required": [
"markdown",
"chunks"
],
"title": "APIResponse",
"type": "object"
}
Call the API Directly
You can call the Agentic Document Extraction API directly using the script below. However, a 2-page maximum applies to PDFs. To process longer documents, use the agentic-doc library instead.
curl --request POST \
--url https://api.va.landing.ai/v1/tools/agentic-document-analysis \
--header 'Authorization: Basic {{your_api_key}}' \
--form "image=@{{path_to_file}}"
# OR, for PDF
curl --request POST \
--url https://api.va.landing.ai/v1/tools/agentic-document-analysis \
--header 'Authorization: Basic {{your_api_key}}' \
--form "pdf=@{{path_to_file}}"
Optional API Parameters
You can include the following optional parameters when calling the Agentic Document Extraction API.
Optional Parameter | Data Type | Description |
---|---|---|
include_marginalia | BOOL | TRUE by default.When selected, the output contains page headers, footers, and numbers. If you plan to use the output for RAG, you may want to set this parameter to |
include_metadata_in_markdown | BOOL | TRUE by default.When selected, the markdown property in the JSON response contains HTML comments at the end of each chunk that specify the visual grounding. These HTML comments are invisible to most markdown renderers.The HTML comments are useful if you plan to use the output with an LLM. If you prefer to have pure markdown output, set this parameter to FALSE . |
Troubleshooting
If you receive any of the following errors when you try to upload a file, there might be latency or availability issues. Pleaes wait a few minutes and try uploading the file again.
LLM provider error
Timeout error
Availability error