PDF API
POST https://llmfoundry.straive.com/-/pdf
converts one or more PDF files to LLM-friendly Markdown. For example:
curl -X POST https://llmfoundry.straive.com/-/pdf \
-F "file=@/path/to/file.pdf" \
-H "Sec-Fetch-Mode: cors"
This returns the Markdown contents of the PDF file(s).
To get the PDF as a JSON array of page-level text, add mode=page
:
curl -X POST https://llmfoundry.straive.com/-/pdf \
-F "file=@/path/to/file.pdf" \
-F "mode=page" \
-H "Sec-Fetch-Mode: cors"
This returns the page_chunks
from PyMuPDF4LLM.
[
{
"metadata": {
"format": "PDF 1.3",
"title": "...",
"page_count": 227,
"page": 1,
},
"toc_items": [], // list of TOC items pointing to this page as [level, title, pagenumber]
"tables": [], // list of tables with {"bbox", "row_count" and "col_count"}
"images": [], // list of images from PyMuPDF.Page.get_image_info
"graphics": [], // list of vector drawings' boundary boxes as per PyMuPDF.Page.cluster_drawings()
"text": "...", // text of the page
"words": [], // empty list
},
]