PDF API

POST https://llmfoundry.straive.com/-/pdf converts one or more PDF files to LLM-friendly Markdown. For example:

curl -X POST https://llmfoundry.straive.com/-/pdf \
  -F "file=@/path/to/file.pdf" \
  -H "Sec-Fetch-Mode: cors"

This returns the Markdown contents of the PDF file(s).

To get the PDF as a JSON array of page-level text, add mode=page:

curl -X POST https://llmfoundry.straive.com/-/pdf \
  -F "file=@/path/to/file.pdf" \
  -F "mode=page" \
  -H "Sec-Fetch-Mode: cors"

This returns the page_chunks from PyMuPDF4LLM.

[
  {
    "metadata": {
      "format": "PDF 1.3",
      "title": "...",
      "page_count": 227,
      "page": 1,
    },
    "toc_items": [], // list of TOC items pointing to this page as [level, title, pagenumber]
    "tables": [], // list of tables with {"bbox", "row_count" and "col_count"}
    "images": [], // list of images from PyMuPDF.Page.get_image_info
    "graphics": [], // list of vector drawings' boundary boxes as per PyMuPDF.Page.cluster_drawings()
    "text": "...", // text of the page
    "words": [], // empty list
  },
]