How-to guides/Text Extraction

Search for text in a PDF and get positions in Rust

Find all occurrences of a string in a PDF and retrieve the bounding box of each match on each page.

rust
use pdfluent::Document;

fn main() -> pdfluent::Result<()> {
    let doc = Document::open("input.pdf")?;

    let matches = doc.search("invoice number")?;

    for m in &matches {
        println!(
            "Page {}: {:?} -> "{}"",
            m.page + 1,
            m.rect,
            m.text
        );
    }

    println!("{} match(es) found", matches.len());
    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Open the PDF

A read-only Document is sufficient for text search.

rust
let doc = Document::open("input.pdf")?;
2

Run a basic string search

doc.search() performs a case-insensitive Unicode-normalized search across all pages and returns a Vec of TextMatch.

rust
let matches = doc.search("invoice number")?;
3

Access match metadata

Each TextMatch carries the zero-based page index, the bounding Rect in page coordinates, and the matched text fragment.

rust
for m in &matches {
    println!(
        "page={} x1={:.1} y1={:.1} x2={:.1} y2={:.1}",
        m.page + 1,
        m.rect.x_min, m.rect.y_min,
        m.rect.x_max, m.rect.y_max,
    );
}
4

Search with options

Use SearchOptions to enable case-sensitive matching or regex search.

rust
use pdfluent::text::SearchOptions;

let opts = SearchOptions::new()
    .case_sensitive(true)
    .whole_word(true);

let matches = doc.search_with("Total", opts)?;
5

Search on a single page

For large documents, searching page by page avoids loading the full text index at once.

rust
let page = doc.page(0)?;
let matches = page.search("signature")?;
for m in &matches {
    println!("Found at {:?}", m.rect);
}

Notes and tips

  • Search results reference page coordinates (origin bottom-left). If you need screen coordinates, invert the y-axis relative to the page height.
  • PDFluent normalizes Unicode NFKC before comparison. Ligatures like fi are decomposed, so searching "fi" will match the ligature glyph.
  • Encrypted PDFs must be decrypted before text extraction. Open with Document::open_with_password first.
  • For regex search, the pattern is matched against the Unicode text stream, not the raw PDF content bytes.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions