Automate PDF document processing at scale with PDFluent: extract text, split/merge pages, apply redactions, and convert to PDF/A — all in pure Rust.
use pdfluent::{Sdk, extract::TextOptions};
let sdk = Sdk::init_with_license("license.json")?;
let doc = sdk.open("contract.pdf")?;
let opts = TextOptions::builder()
.include_coordinates(true)
.preserve_reading_order(true)
.build();
let text = doc.text(opts)?;
for page in text.pages() {
for block in page.blocks() {
println!("[{:.0},{:.0}] {}", block.x(), block.y(), block.text());
}
}Run cargo add [email protected] to get started.
Extract text with full layout context: font names, sizes, coordinates, reading order, and paragraph boundaries. Useful for search indexing, redaction detection, and content migration.
Local OCR via ocrs (WASM-compatible, no server upload). Cloud adapters for Mistral OCR, Google Document AI, AWS Textract, and Azure Form Recognizer. Same API, swappable backend.
Find and permanently remove sensitive content — PII, account numbers, legal references. Redaction burns through to the content stream, not just the visual layer.
Split by page range, bookmark, or content pattern. Merge multiple PDFs with bookmark and page label preservation. Works on linearised and encrypted files.
Rotate, crop, resize, and reorder pages. Add watermarks, headers, footers, and overlays. Flatten annotations and form fields.
Convert PDF to DOCX, XLSX, and PPTX with layout fidelity. Convert Office documents to PDF. Render pages to PNG, JPEG, or SVG at any DPI.