How-to guides/Redaction & Privacy

GDPR-compliant PDF redaction in Rust

Permanently remove personal data from PDF content streams — not just visually. A black-box overlay leaves text extractable; PDFluent removes it from the object layer.

rust
use pdfluent::{Document, RedactionOptions};

fn main() -> pdfluent::Result<()> {
    let mut doc = Document::open("contract_with_pii.pdf")?;

    // Mark all occurrences of a pattern for redaction
    doc.mark_redactions_by_pattern(r"\b\d{3}-\d{2}-\d{4}\b")?; // SSN pattern
    doc.mark_redactions_by_pattern(r"\b[A-Z][a-z]+ [A-Z][a-z]+\b")?; // Names

    // Apply: permanently removes content from PDF object layer
    // This is NOT a visual overlay — text cannot be extracted after this
    let options = RedactionOptions {
        fill_color: [0.0, 0.0, 0.0],  // black
        overlay_text: None,
    };
    doc.apply_redactions(options)?;

    doc.save("contract_redacted.pdf")?;
    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Identify what needs to be redacted

Determine which personal data is present in the document. PDFluent supports regex-based pattern matching, manual region selection, and full-text search. Common patterns include social security numbers, names, email addresses, phone numbers, and financial identifiers.

rust
use pdfluent::Document;

let mut doc = Document::open("contract_with_pii.pdf")?;

// Regex pattern matching — mark all occurrences across all pages
doc.mark_redactions_by_pattern(r"\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Z|a-z]{2,}\b")?; // email
doc.mark_redactions_by_pattern(r"\b\d{3}-\d{2}-\d{4}\b")?; // US SSN
doc.mark_redactions_by_pattern(r"\b\d{4}[\s\-]?\d{4}[\s\-]?\d{4}[\s\-]?\d{4}\b")?; // credit card
2

Mark regions for redaction

Marking creates redaction annotations over the identified content. At this stage the content is still present — marks are reversible until you call apply_redactions(). This lets you review and adjust the selection before committing.

rust
use pdfluent::{Document, PageRect};

// Alternatively: mark a specific region by coordinates (page index, x1, y1, x2, y2)
doc.mark_redaction_region(0, PageRect::new(72.0, 600.0, 300.0, 620.0))?;

// Or mark by exact text string
doc.mark_redactions_by_text("Jan de Vries")?;
3

Apply redactions — permanently removes content from the content stream

apply_redactions() modifies the PDF content stream directly. Text and image data in marked regions is removed from the object layer, not just hidden. After this step the content cannot be recovered, even by a PDF parser operating on the raw bytes.

rust
use pdfluent::RedactionOptions;

// WARNING: this operation is irreversible. Test on a copy first.
let options = RedactionOptions {
    fill_color: [0.0, 0.0, 0.0],  // fill redacted area with black
    overlay_text: None,            // or Some("REDACTED") to add a label
};

doc.apply_redactions(options)?;
4

Verify: confirm the content is gone

After applying redactions, extract the text and check that the personal data is absent. This step is important for audit purposes and to confirm the redaction worked as intended before delivering or storing the document.

rust
// Verify: extracted text must not contain the redacted values
let text = doc.extract_text()?;
assert!(!text.contains("Jan de Vries"), "Redaction failed: name still present");
assert!(!text.contains("123-45-6789"), "Redaction failed: SSN still present");

println!("Verification passed — PII removed from content stream");
5

Clear metadata and save to a new file

PDF metadata (Author, Title, Subject, Keywords) and XMP metadata may contain PII independently of the page content. Clear it before saving. Keep the original file for your audit trail if required by your data retention policy.

rust
// Clear document-level metadata that may contain PII
doc.clear_metadata()?;

// Save to a new path — keep the original for audit trail if needed
doc.save("contract_redacted.pdf")?;

println!("Redacted document saved. Original preserved at contract_with_pii.pdf");

Notes and tips

  • Redaction via apply_redactions() is irreversible. Always work on a copy of the original document.
  • A black rectangle drawn as a PDF annotation or content stream overlay does NOT constitute GDPR-compliant redaction. The underlying text remains in the content stream and can be extracted with any PDF parser.
  • Annotations, bookmarks, form fields, and JavaScript may also contain PII. Review these separately after redacting page content.
  • GDPR Article 17 requires erasure "without undue delay" — in practice, supervisory authorities interpret this as within one month of a valid request.
  • For CCPA compliance (California Consumer Privacy Act), the same principle applies: visual hiding is not deletion. Use apply_redactions() to permanently remove data from the content layer.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions