PDFluent is a pure-Rust PDF SDK. No C++ at the core, no PDFium under the hood, no memory corruption bugs. It runs natively and in the browser — same code, same behaviour.
I got frustrated. I was building a document-processing pipeline that needed to handle XFA forms — the kind of PDF forms that governments and banks have been generating for the past 20 years. Every library I tried either ignored XFA entirely, crashed on anything slightly malformed, or dragged in a full PDFium build that weighed 80 MB and came with its own set of C++ vulnerabilities.
So in 2024 I started writing a PDF parser in Rust. Just for myself, just to see if it was possible. It worked. Then I added an XFA engine. Then a renderer. Then a PDF/A converter. At some point it became a proper SDK — one I would actually want to use in production.
PDFluent is the result. It's been tested against 20,000 real-world PDFs — the kind that come from governments, banks, and enterprise software, not PDF spec examples. XFA forms parse and flatten reliably. Visual accuracy is actively improving — we test against a golden set derived from Adobe output. The PDF/A converter passes a 20 K corpus with zero false negatives.
The parser, XFA engine, and renderer are written in safe Rust. No FFI boundary to cross, no C++ exceptions to catch, no need to ship a platform binary for every OS.
Compile to WebAssembly and the same code runs in the browser without a server. Useful for client-side previews, offline processing, or reducing API round-trips.
XFA (XML Forms Architecture) is a 1,000-page spec that almost every modern PDF library ignores. PDFluent parses, evaluates FormCalc scripts, and flattens XFA forms to standard PDF.
Not just the spec examples. A corpus of 20,000 PDFs from the wild — government forms, invoices, scanned documents, certified PDFs — with a MuPDF-rendered oracle for SSIM comparison.
Rust's ownership model catches entire classes of bugs at compile time. No buffer overflows, no use-after-free, no null pointer dereferences. The kind of bugs that fill PDF-library CVE lists.
Rust, Python (PyO3), Node.js (napi-rs), Java (JNI), C (cdylib header), and WebAssembly. Same underlying engine everywhere.
The core engine is open source under the MIT licence. The commercial parts fund continued development and keep the lights on.
| Component | Open source | Notes |
|---|---|---|
| PDF parser (pdf-syntax) | Read-only, zero-copy parsing | |
| PDF/A validator | ISO 19005 compliance checks | |
| XFA engine | Parse, flatten, FormCalc | |
| Renderer | Page-to-image, tested on 20 K PDFs | |
| Language bindings | Python, Node.js, Java, C | |
| WASM bundles | Browser-ready, tree-shaken | |
| PDF/A converter | Production-grade compliance pipeline | |
| Test corpus & oracle DB | 20 K real-world PDFs + ground truth | |
| Commercial support | SLA, priority fixes, private Slack |
If you want to read the engine source, the GitHub repo is public. If you want to use PDFluent in production, you need a licence key — see pricing.
It's one person for now. I prefer to be upfront about that.
Spent years processing enterprise PDFs in production. Got tired of the tools. Built something better.
PDFluent is built in the open. If you find a bug, have a feature request, or want to submit a fix — open an issue or PR on GitHub. The test corpus is large enough that even a single PDF that exposes a new edge case is genuinely useful.
Contact to request accessQuestions about the SDK, a custom integration, enterprise pricing, or something PDFluent doesn't support yet? I reply to everything.