How-to guides/Errors & Debugging
PdfluentError::OutOfMemory
Error: allocation failed, document requires too much memory

How to fix out-of-memory errors on large PDFs

This error occurs when PDFluent cannot allocate enough memory to process a document. Common causes are large uncompressed images, loading all pages at once, or very large XMP metadata blocks.

Why this happens

Large images are decompressed into memory all at once

PDFs with high-resolution images (300 DPI scans, large photography) can contain image streams that expand to hundreds of MB after decompression. When PDFluent loads multiple pages simultaneously, the total memory usage multiplies.

All pages are held in memory simultaneously

Document::open() by default loads the full document structure into memory. For documents with thousands of pages, iterating pages without releasing earlier ones keeps all parsed content in RAM.

Large XMP metadata or embedded attachments

Some PDF generators embed large XML metadata blocks or attach binary files (Word documents, ZIP archives) inside the PDF. These are loaded with the document catalog and consume memory before any page content is read.

How to fix it

1

Use streaming mode for large files

Document::open_streaming() reads the xref table and document catalog without loading all content streams. Pages are loaded on demand and can be dropped after processing.

use pdfluent::Document;

// Streaming mode: low base memory, pages loaded on demand
let doc = Document::open_streaming("large_scan.pdf")?;

for i in 0..doc.page_count() {
    let page = doc.page(i)?;
    let text = page.extract_text()?;
    println!("Page {}: {} chars", i + 1, text.len());
    // page drops here, freeing image and content stream memory
}
2

Process one page at a time with explicit drops

In a loop, process each page and drop it before loading the next. This keeps peak memory proportional to one page rather than the whole document.

use pdfluent::Document;

let doc = Document::open_streaming("large.pdf")?;
let count = doc.page_count();

for i in 0..count {
    {
        // Inner scope: page and its resources drop at end of block
        let page = doc.page(i)?;
        let text = page.extract_text()?;
        process_text(i, &text);
    } // page memory freed here
}
3

Reduce image resolution before processing

If you are re-rendering or re-saving the document, downsample images first. This shrinks the in-memory image buffers and reduces the size of the output file.

use pdfluent::{Document, ImageResampleOptions};

let mut doc = Document::open_streaming("scanned.pdf")?;

// Downsample all images to max 150 DPI before further processing
doc.resample_images(ImageResampleOptions {
    max_dpi: 150,
    ..Default::default()
})?;

doc.save("scanned_compressed.pdf")?;

Frequently asked questions