How-to guides/Errors & Debugging
PdfluentError::InvalidDocument
Error: failed to parse PDF structure at byte offset 0

How to fix the invalid PDF parse error in PDFluent

This error means PDFluent could not parse the file as a valid PDF. Common causes are passing the wrong file, a truncated download, or a corrupted cross-reference table.

Why this happens

The file is not a PDF

The file does not start with the PDF magic bytes (%PDF-). Common cases: an HTML error page saved with a .pdf extension, a ZIP archive, or a file path that resolves to an empty or placeholder file.

The file was truncated during download or transfer

PDFs downloaded over HTTP without Content-Length validation can be incomplete. A valid PDF must end with a %%EOF marker and a complete cross-reference section. If the transfer cut off mid-file, the xref table is missing and PDFluent cannot locate the document catalog.

Corrupted cross-reference table

The xref table maps byte offsets to PDF objects. If a file was modified by a buggy tool and the offsets are wrong, PDFluent cannot locate objects and returns this error. The PDF may still be recoverable with repair mode.

How to fix it

1

Check the file magic bytes

Before calling Document::open, verify the file starts with %PDF-. This catches the wrong-file-type case immediately without a more expensive parse attempt.

use std::io::Read;

fn is_pdf(path: &str) -> std::io::Result<bool> {
    let mut buf = [0u8; 5];
    let mut f = std::fs::File::open(path)?;
    f.read_exact(&mut buf)?;
    Ok(&buf == b"%PDF-")
}

if !is_pdf("upload.pdf")? {
    eprintln!("File is not a PDF");
    return Ok(());
}
2

Use repair mode for corrupted files

Document::open_repair() attempts to rebuild the xref table by scanning the file for object markers. It is slower but can recover many files with corrupted or missing xref sections.

use pdfluent::Document;

// Try normal open first; fall back to repair if it fails
let doc = Document::open("damaged.pdf")
    .or_else(|_| Document::open_repair("damaged.pdf"))?;

println!("Recovered {} pages", doc.page_count());
3

Verify the file path and check for empty files

Check that the path is correct and that the file has non-zero size. A common mistake in server code is writing to a temp path that resolves differently at runtime.

use std::fs;
use pdfluent::Document;

let meta = fs::metadata("document.pdf")?;
if meta.len() == 0 {
    return Err("PDF file is empty".into());
}

let doc = Document::open("document.pdf")?;

Frequently asked questions