How-to guides/Merge & Split

Extract specific pages from a PDF in Rust

Pick individual pages or non-contiguous sets and write them to a new PDF. Works with page numbers, page labels, or a custom predicate.

rust
use pdfluent::PdfDocument;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let doc = PdfDocument::open("report.pdf")?;

    // Extract pages 1, 3, 5 and 7
    doc.extract_pages(&[1, 3, 5, 7])?
        .save("selected.pdf")?;

    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Open the source PDF

Open the document you want to extract pages from. Page count is available immediately after opening.

rust
use pdfluent::PdfDocument;

let doc = PdfDocument::open("source.pdf")?;
println!("{} pages total", doc.page_count());
2

Extract by page numbers

Pass a slice of 1-based page numbers. Pages appear in the output in the order given, so you can reorder them freely.

rust
// Extract pages 2, 5, and 8 in that order
let extracted = doc.extract_pages(&[2, 5, 8])?;
extracted.save("three_pages.pdf")?;
3

Extract by page label

If the PDF uses custom page labels such as "i", "ii", "A-1", pass label strings instead of integers.

rust
let extracted = doc.extract_pages_by_label(&["i", "ii", "1", "2"])?;
extracted.save("front_matter_and_intro.pdf")?;
4

Extract with a filter predicate

Use extract_pages_where() to filter programmatically. The closure receives a PageInfo struct with page number, label, width, height, and rotation.

rust
use pdfluent::PageInfo;

// Keep only landscape pages
let extracted = doc.extract_pages_where(|p: &PageInfo| {
    p.width > p.height
})?;

extracted.save("landscape_pages.pdf")?;
5

Save or return as bytes

Call save() to write to disk, or to_bytes() to get the PDF data as a Vec<u8> for streaming or further processing.

rust
let bytes = doc.extract_pages(&[1, 2, 3])?.to_bytes()?;
// Send over HTTP, write to S3, etc.
println!("Extracted PDF is {} bytes", bytes.len());

Notes and tips

  • Pages are extracted in the order of the input slice. Duplicates are allowed and produce repeated pages.
  • Annotations and form fields on extracted pages are included.
  • If a page number is out of range, extract_pages returns an Err immediately before writing any output.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions