How much memory does PDFluent use for a typical document?

A 50-page document with standard text content typically uses 10 to 30 MB in normal mode. A 50-page document with full-page scanned images at 300 DPI can use 200 to 500 MB after decompression.

Is there a file size limit?

There is no hard limit. PDFluent can process arbitrarily large files in streaming mode. The constraint is available system memory relative to the uncompressed content of the pages being processed.

Can I set a memory limit to get a cleaner error instead of a crash?

Use Document::open_with_limits() to set a maximum memory budget. PDFluent will return OutOfMemory before exhausting system RAM, giving you a recoverable error.

PDFluentSDK

← Editor Download

How-to guides/Errors & Debugging

PdfluentError::OutOfMemory

Error: allocation failed, document requires too much memory

How to fix out-of-memory errors on large PDFs

This error occurs when PDFluent cannot allocate enough memory to process a document. Common causes are large uncompressed images, loading all pages at once, or very large XMP metadata blocks.

Why this happens

Large images are decompressed into memory all at once

PDFs with high-resolution images (300 DPI scans, large photography) can contain image streams that expand to hundreds of MB after decompression. When PDFluent loads multiple pages simultaneously, the total memory usage multiplies.

All pages are held in memory simultaneously

PdfDocument::open() by default loads the full document structure into memory. For documents with thousands of pages, iterating pages without releasing earlier ones keeps all parsed content in RAM.

Large XMP metadata or embedded attachments

Some PDF generators embed large XML metadata blocks or attach binary files (Word documents, ZIP archives) inside the PDF. These are loaded with the document catalog and consume memory before any page content is read.

How to fix it

Use streaming mode for large files

Document::open_streaming() reads the xref table and document catalog without loading all content streams. Pages are loaded on demand and can be dropped after processing.

use pdfluent::PdfDocument;

// Streaming mode: low base memory, pages loaded on demand
let doc = Document::open_streaming("large_scan.pdf")?;

for i in 0..doc.page_count() {
    let page = doc.page(i)?;
    let text = page.text()?;
    println!("Page {}: {} chars", i + 1, text.len());
    // page drops here, freeing image and content stream memory
}

Process one page at a time with explicit drops

In a loop, process each page and drop it before loading the next. This keeps peak memory proportional to one page rather than the whole document.

use pdfluent::PdfDocument;

let doc = Document::open_streaming("large.pdf")?;
let count = doc.page_count();

for i in 0..count {
    {
        // Inner scope: page and its resources drop at end of block
        let page = doc.page(i)?;
        let text = page.text()?;
        process_text(i, &text);
    } // page memory freed here
}

Reduce image resolution before processing

If you are re-rendering or re-saving the document, downsample images first. This shrinks the in-memory image buffers and reduces the size of the output file.

use pdfluent::{PdfDocument, ImageResampleOptions};

let mut doc = Document::open_streaming("scanned.pdf")?;

// Downsample all images to max 150 DPI before further processing
doc.resample_images(ImageResampleOptions {
    max_dpi: 150,
    ..Default::default()
})?;

doc.save("scanned_compressed.pdf")?;

Frequently asked questions

Read the docs