Can I extract images that span multiple pages or are repeated?

PDFs can reference the same XObject image on multiple pages. Each page.images() call returns the image as it appears on that page. Use image.xobject_id() to deduplicate if needed.

Does PDFluent extract images from password-protected PDFs?

Yes, if you open the PDF with the correct password. Open with PdfDocument::open_with_password(), then extract images as normal.

What formats can PDFluent extract images in?

PDFluent extracts JPEG (DCTDecode), JPEG 2000 (JPXDecode), PNG-compatible (FlateDecode), CCITT (FAXDecode), and JBIG2 images in their original format. All formats can also be converted to PNG via to_png_bytes().

How do I get the position of an image on the page?

Call image.transform() to get the 6-element transformation matrix that places the image on the page. Convert to x, y, width, height in PDF points with image.bounding_box().

PDFluentSDK

← Editor Download

How-to guides/Images

Extract embedded images from a PDF in Rust

Pull JPEG, PNG, and JBIG2 images out of a PDF without re-encoding. Preserves original compression and quality.

rust

use pdfluent::PdfDocument;

fn main() -> pdfluent::Result<()> {
    let doc = PdfDocument::open("document.pdf")?;

    for (page_idx, page) in doc.pages().enumerate() {
        for (img_idx, image) in page.images().enumerate() {
            let filename = format!(
                "page{}_img{}.{}",
                page_idx + 1,
                img_idx + 1,
                image.format().extension()
            );
            image.save(&filename)?;
            println!("Saved {} ({}x{})", filename, image.width(), image.height());
        }
    }

    Ok(())
}

Install:cargo add [email protected]Download SDK →

Step by step

Add PDFluent to Cargo.toml

Image extraction is part of the base crate. No extra features are required.

rust

# Cargo.toml
[dependencies]
pdfluent = "0.9"

Open the document and iterate images per page

page.images() returns an iterator over all XObject images on the page. Inline images are included.

rust

use pdfluent::PdfDocument;

let doc = PdfDocument::open("document.pdf")?;

for (i, page) in doc.pages().enumerate() {
    let count = page.images().count();
    println!("Page {}: {} image(s)", i + 1, count);
}

Save images in their original format

image.save() writes the image bytes to a file without re-encoding. JPEG images stay JPEG, preserving the original quality.

rust

for (i, page) in doc.pages().enumerate() {
    for (j, image) in page.images().enumerate() {
        let ext = image.format().extension(); // "jpg", "png", "jbig2"
        let path = format!("output/p{}_i{}.{}", i + 1, j + 1, ext);
        image.save(&path)?;
    }
}

Convert any image to PNG

Call image.to_png_bytes() to decode the image and re-encode as PNG, regardless of its original format.

rust

use std::fs;

for (i, page) in doc.pages().enumerate() {
    for (j, image) in page.images().enumerate() {
        let png_bytes = image.to_png_bytes()?;
        let path = format!("output/p{}_i{}.png", i + 1, j + 1);
        fs::write(&path, &png_bytes)?;
        println!("Wrote PNG: {}", path);
    }
}

Filter images by size

Skip thumbnails and decorative images by checking pixel dimensions before saving.

rust

for (i, page) in doc.pages().enumerate() {
    for (j, image) in page.images()
        .filter(|img| img.width() >= 100 && img.height() >= 100)
        .enumerate()
    {
        let path = format!("output/p{}_i{}.jpg", i + 1, j + 1);
        image.save(&path)?;
        println!(
            "Saved {}x{} image: {}",
            image.width(), image.height(), path
        );
    }
}

Notes and tips

PDF images are stored as XObjects with their own compression. JPEG images embedded in PDFs are stored as raw JPEG streams and are extracted without quality loss.
JBIG2 is a common format for scanned document pages (black-and-white, highly compressed). Not all image viewers support JBIG2 natively. Use to_png_bytes() for universal compatibility.
The dimensions returned by image.width() and image.height() are in pixels at the image resolution, not in PDF points.
Soft masks (alpha channels) attached to images are extracted separately. Call image.soft_mask() to access the mask XObject.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions

Download PDFluent

Extract embedded images from a PDF in Rust

Step by step

Add PDFluent to Cargo.toml

Open the document and iterate images per page

Save images in their original format

Convert any image to PNG

Filter images by size

Notes and tips

Why PDFluent for this

Frequently asked questions

Related guides