Extract embedded images from a PDF in Rust

Pull JPEG, PNG, and JBIG2 images out of a PDF without re-encoding. Preserves original compression and quality.

rust
use pdfluent::PdfDocument;

fn main() -> pdfluent::Result<()> {
    let doc = PdfDocument::open("document.pdf")?;

    for (page_idx, page) in doc.pages().enumerate() {
        for (img_idx, image) in page.images().enumerate() {
            let filename = format!(
                "page{}_img{}.{}",
                page_idx + 1,
                img_idx + 1,
                image.format().extension()
            );
            image.save(&filename)?;
            println!("Saved {} ({}x{})", filename, image.width(), image.height());
        }
    }

    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Add PDFluent to Cargo.toml

Image extraction is part of the base crate. No extra features are required.

rust
# Cargo.toml
[dependencies]
pdfluent = "0.9"
2

Open the document and iterate images per page

page.images() returns an iterator over all XObject images on the page. Inline images are included.

rust
use pdfluent::PdfDocument;

let doc = PdfDocument::open("document.pdf")?;

for (i, page) in doc.pages().enumerate() {
    let count = page.images().count();
    println!("Page {}: {} image(s)", i + 1, count);
}
3

Save images in their original format

image.save() writes the image bytes to a file without re-encoding. JPEG images stay JPEG, preserving the original quality.

rust
for (i, page) in doc.pages().enumerate() {
    for (j, image) in page.images().enumerate() {
        let ext = image.format().extension(); // "jpg", "png", "jbig2"
        let path = format!("output/p{}_i{}.{}", i + 1, j + 1, ext);
        image.save(&path)?;
    }
}
4

Convert any image to PNG

Call image.to_png_bytes() to decode the image and re-encode as PNG, regardless of its original format.

rust
use std::fs;

for (i, page) in doc.pages().enumerate() {
    for (j, image) in page.images().enumerate() {
        let png_bytes = image.to_png_bytes()?;
        let path = format!("output/p{}_i{}.png", i + 1, j + 1);
        fs::write(&path, &png_bytes)?;
        println!("Wrote PNG: {}", path);
    }
}
5

Filter images by size

Skip thumbnails and decorative images by checking pixel dimensions before saving.

rust
for (i, page) in doc.pages().enumerate() {
    for (j, image) in page.images()
        .filter(|img| img.width() >= 100 && img.height() >= 100)
        .enumerate()
    {
        let path = format!("output/p{}_i{}.jpg", i + 1, j + 1);
        image.save(&path)?;
        println!(
            "Saved {}x{} image: {}",
            image.width(), image.height(), path
        );
    }
}

Notes and tips

  • PDF images are stored as XObjects with their own compression. JPEG images embedded in PDFs are stored as raw JPEG streams and are extracted without quality loss.
  • JBIG2 is a common format for scanned document pages (black-and-white, highly compressed). Not all image viewers support JBIG2 natively. Use to_png_bytes() for universal compatibility.
  • The dimensions returned by image.width() and image.height() are in pixels at the image resolution, not in PDF points.
  • Soft masks (alpha channels) attached to images are extracted separately. Call image.soft_mask() to access the mask XObject.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions