How-to guides/Redaction

Redact text matching a regex pattern from a PDF in Rust

Use regular expressions to find and permanently remove credit card numbers, SSNs, email addresses, or any structured data from a PDF.

rust
use pdfluent::{PdfDocument, RedactOptions};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut doc = PdfDocument::open("report.pdf")?;

    // Redact all US Social Security Numbers
    let ssn_pattern = r"\b\d{3}-\d{2}-\d{4}\b";
    doc.redact_pattern(ssn_pattern, RedactOptions::default())?;

    doc.apply_redactions()?;
    doc.save("report_redacted.pdf")?;
    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Add PDFluent to your project

Add the pdfluent crate to Cargo.toml.

rust
[dependencies]
pdfluent = "0.9"
2

Open the PDF

Load the file from disk or from an in-memory buffer.

rust
use pdfluent::PdfDocument;

let mut doc = PdfDocument::open("customer_data.pdf")?;
3

Define your regex patterns

Write patterns for the data types you want to remove. PDFluent uses the Rust regex crate syntax.

rust
// Credit card: Visa, Mastercard, Amex formats
let cc_pattern = r"\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})\b";

// Email addresses
let email_pattern = r"[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}";

// US phone numbers
let phone_pattern = r"\b\(?\d{3}\)?[\s.\-]?\d{3}[\s.\-]?\d{4}\b";
4

Apply pattern-based redaction

Call redact_pattern() for each regex. You can chain multiple patterns before calling apply_redactions().

rust
use pdfluent::RedactOptions;

let opts = RedactOptions::default()
    .fill_color(pdfluent::Color::black())
    .overlay_text("REDACTED");

doc.redact_pattern(cc_pattern, opts.clone())?;
doc.redact_pattern(email_pattern, opts.clone())?;
doc.redact_pattern(phone_pattern, opts)?;
5

Apply and save

apply_redactions() permanently removes all matched text from the content stream.

rust
doc.apply_redactions()?;
doc.save("customer_data_clean.pdf")?;

println!("Pattern redaction complete.");

Notes and tips

  • PDFluent uses the Rust regex crate. Patterns are case-sensitive by default. Use (?i) for case-insensitive matching.
  • Text in PDFs may include ligatures or kerning gaps. If a pattern does not match expected text, extract the raw text first to inspect the actual character sequence.
  • Pattern redaction works on text content streams only. Text inside images requires OCR before redaction.
  • Call get_redaction_marks() after redact_pattern() to preview what will be removed before applying.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions