How-to guides/Text Extraction

Compare the text content of two PDFs in Rust

Extract and diff the text of two PDF documents page by page to find additions, deletions, and changes.

rust
use pdfluent::{Document, diff::TextDiff};

fn main() -> pdfluent::Result<()> {
    let doc_a = Document::open("version_a.pdf")?;
    let doc_b = Document::open("version_b.pdf")?;

    let diff = TextDiff::compare(&doc_a, &doc_b)?;

    if diff.is_identical() {
        println!("Documents are text-identical.");
    } else {
        for change in diff.changes() {
            println!(
                "Page {}: {:?} {:?}",
                change.page + 1,
                change.kind,
                change.text,
            );
        }
    }

    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Open both documents

Open the two PDF files you want to compare as read-only Documents.

rust
let doc_a = Document::open("original.pdf")?;
let doc_b = Document::open("revised.pdf")?;
2

Run a text diff

TextDiff::compare extracts the plain text from each page and computes a line-level diff using the longest common subsequence algorithm.

rust
use pdfluent::diff::TextDiff;

let diff = TextDiff::compare(&doc_a, &doc_b)?;
3

Check if the documents are identical

is_identical() is a quick check before iterating individual changes.

rust
if diff.is_identical() {
    println!("No text differences found.");
    return Ok(());
}
4

Iterate changes

Each DiffChange carries the page index, change kind (Added, Removed, or Changed), and the text content.

rust
use pdfluent::diff::ChangeKind;

for change in diff.changes() {
    let marker = match change.kind {
        ChangeKind::Added   => "+",
        ChangeKind::Removed => "-",
        ChangeKind::Changed => "~",
    };
    println!(
        "Page {:>3} {} {}",
        change.page + 1,
        marker,
        change.text.trim(),
    );
}
5

Compare page counts and report structural differences

If the documents have different page counts, pages that exist only in one document are reported as whole-page additions or deletions.

rust
println!("Pages in A: {}", doc_a.page_count());
println!("Pages in B: {}", doc_b.page_count());
println!("Total changes: {}", diff.change_count());
println!("Pages with changes: {}", diff.changed_page_count());

Notes and tips

  • Text comparison ignores visual formatting (fonts, sizes, colors). Two pages that look different but have the same words will show no text differences.
  • PDFluent normalizes whitespace and Unicode before diffing. Extra spaces and different line break encodings do not produce spurious differences.
  • For visual comparison (pixel-level diff), use the render API to produce images and compare them separately.
  • Large documents with many changes may produce a large DiffChange list. Use diff.summary() to get a compact per-page summary instead.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions