Permanently remove personal data from PDF content streams — not just visually. A black-box overlay leaves text extractable; PDFluent removes it from the object layer.
use pdfluent::{Document, RedactionOptions};
fn main() -> pdfluent::Result<()> {
let mut doc = Document::open("contract_with_pii.pdf")?;
// Mark all occurrences of a pattern for redaction
doc.mark_redactions_by_pattern(r"\b\d{3}-\d{2}-\d{4}\b")?; // SSN pattern
doc.mark_redactions_by_pattern(r"\b[A-Z][a-z]+ [A-Z][a-z]+\b")?; // Names
// Apply: permanently removes content from PDF object layer
// This is NOT a visual overlay — text cannot be extracted after this
let options = RedactionOptions {
fill_color: [0.0, 0.0, 0.0], // black
overlay_text: None,
};
doc.apply_redactions(options)?;
doc.save("contract_redacted.pdf")?;
Ok(())
}Determine which personal data is present in the document. PDFluent supports regex-based pattern matching, manual region selection, and full-text search. Common patterns include social security numbers, names, email addresses, phone numbers, and financial identifiers.
use pdfluent::Document;
let mut doc = Document::open("contract_with_pii.pdf")?;
// Regex pattern matching — mark all occurrences across all pages
doc.mark_redactions_by_pattern(r"\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Z|a-z]{2,}\b")?; // email
doc.mark_redactions_by_pattern(r"\b\d{3}-\d{2}-\d{4}\b")?; // US SSN
doc.mark_redactions_by_pattern(r"\b\d{4}[\s\-]?\d{4}[\s\-]?\d{4}[\s\-]?\d{4}\b")?; // credit cardMarking creates redaction annotations over the identified content. At this stage the content is still present — marks are reversible until you call apply_redactions(). This lets you review and adjust the selection before committing.
use pdfluent::{Document, PageRect};
// Alternatively: mark a specific region by coordinates (page index, x1, y1, x2, y2)
doc.mark_redaction_region(0, PageRect::new(72.0, 600.0, 300.0, 620.0))?;
// Or mark by exact text string
doc.mark_redactions_by_text("Jan de Vries")?;apply_redactions() modifies the PDF content stream directly. Text and image data in marked regions is removed from the object layer, not just hidden. After this step the content cannot be recovered, even by a PDF parser operating on the raw bytes.
use pdfluent::RedactionOptions;
// WARNING: this operation is irreversible. Test on a copy first.
let options = RedactionOptions {
fill_color: [0.0, 0.0, 0.0], // fill redacted area with black
overlay_text: None, // or Some("REDACTED") to add a label
};
doc.apply_redactions(options)?;After applying redactions, extract the text and check that the personal data is absent. This step is important for audit purposes and to confirm the redaction worked as intended before delivering or storing the document.
// Verify: extracted text must not contain the redacted values
let text = doc.extract_text()?;
assert!(!text.contains("Jan de Vries"), "Redaction failed: name still present");
assert!(!text.contains("123-45-6789"), "Redaction failed: SSN still present");
println!("Verification passed — PII removed from content stream");PDF metadata (Author, Title, Subject, Keywords) and XMP metadata may contain PII independently of the page content. Clear it before saving. Keep the original file for your audit trail if required by your data retention policy.
// Clear document-level metadata that may contain PII
doc.clear_metadata()?;
// Save to a new path — keep the original for audit trail if needed
doc.save("contract_redacted.pdf")?;
println!("Redacted document saved. Original preserved at contract_with_pii.pdf");No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.
Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.
Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.