Benchmarks are easy to manipulate and hard to interpret. This post attempts to do a fair comparison between PDFluent and iText 9 for server-side PDF processing. We explain exactly how we measured, what the numbers mean, and where iText is legitimately better.
One upfront note: iText has no rendering engine. It cannot rasterize a PDF to an image. Any benchmark involving rendering is PDFluent-only, so we exclude those entirely and focus on the operations both libraries support.
Test environment
Instance: AWS c6i.2xlarge
CPU: Intel Xeon Ice Lake, 8 vCPU @ 3.5 GHz
RAM: 16 GB
OS: Ubuntu 22.04 LTS
PDFluent: v1.0 (current release)
iText: 9.1.0 (Java), OpenJDK 21.0.2 (GraalVM)
JVM flags: -Xms512m -Xmx4g -XX:+UseG1GC
Warmup: 10 runs discarded
Measurement: median of 100 runs
PDF corpus: 100 files, 1–50 pages, varied contentWe used GraalVM (not standard OpenJDK) to give iText the best possible JVM performance. Standard OpenJDK JIT typically runs 15–25% slower than GraalVM for this type of workload.
Cold start
"Cold start" means: from process launch to first byte of output. This matters for serverless functions, short-lived containers, and any deployment where you can't keep a warm process pool running.
| PDFluent | iText (GraalVM) | iText (std. JDK 21) | |
|---|---|---|---|
| Process start → ready | 8ms | 820ms | 1,100ms |
| First document parsed | +12ms | +35ms | +40ms |
| Total cold start | ~20ms | ~855ms | ~1,140ms |
The JVM startup time dominates iText's cold start. GraalVM's ahead-of-time compilation (GraalVM Native Image) can reduce this significantly — we tested that separately:
| PDFluent | iText (GraalVM Native Image) | |
|---|---|---|
| Process start → ready | 8ms | 85ms |
| First document parsed | +12ms | +45ms |
| Total cold start | ~20ms | ~130ms |
GraalVM Native Image brings iText much closer. If you compile iText to a native binary, the cold start gap shrinks from ~40× to ~6×. The tradeoff: GraalVM Native Image has restrictions (no dynamic class loading, reflection requires configuration) that can be difficult to satisfy with a complex library like iText.
PDF parsing throughput
Parse 1,000 PDFs (varied sizes, 1–50 pages each). Measure total wall time and peak RSS.
| PDFluent | iText 9 | |
|---|---|---|
| Total time (sequential) | 18.2s | 61.4s |
| Total time (parallel, 8 threads) | 3.1s | 9.8s |
| Peak RSS | 480 MB | 1,940 MB |
| Throughput (sequential) | 54.9 docs/s | 16.3 docs/s |
| Throughput (parallel) | 322 docs/s | 102 docs/s |
PDFluent's memory advantage is significant: 480MB vs 1,940MB for the same workload. This is partly the JVM overhead (heap metadata, GC bookkeeping, class metadata) and partly iText's object model, which keeps more of the document in memory during parsing.
Memory profile
We also measured peak memory per document, not just batch totals:
| Document size | PDFluent peak | iText peak |
|---|---|---|
| Simple, 5 pages | 8 MB | 85 MB |
| Complex, 20 pages | 42 MB | 210 MB |
| Large, 100 pages | 180 MB | 780 MB |
| Scanned (image-heavy), 50 pages | 320 MB | 1,200 MB |
The scanned document case shows the biggest gap: PDFluent decodes image streams lazily and frees them after processing; iText's parser holds more of the object graph in the Java heap.
Text extraction
| PDFluent | iText 9 | |
|---|---|---|
| 100 simple docs (text-only) | 12.1s | 41.8s |
| 100 complex docs (mixed) | 19.4s | 68.2s |
| 100 CJK docs (multi-byte fonts) | 24.7s | 89.3s |
| Output quality — simple | Good | Good |
| Output quality — complex layout | Moderate | Good |
iText's text extraction quality for complex layouts (multi-column, tables, mixed writing directions) is better than ours. iText uses a spatial clustering algorithm that produces more accurate reading-order reconstruction. Our extractor works well for simple layouts but can produce out-of-order text for documents with overlapping text blocks or unusual column structures.
PDF/A validation
iText is the reference implementation for PDF/A processing. Their validation engine is co-developed with the veraPDF team and is considered the most accurate PDF/A validator available. We validate against the same veraPDF conformance checker to compare our results.
| Test set | PDFluent (speed) | iText (speed) | PDFluent accuracy | iText accuracy |
|---|---|---|---|---|
| 100 conformant docs | 3.5s | 28.4s | 100% | 100% |
| isartor test suite (400 docs) | 14.2s | 112.8s | 98.1% | 99.9% |
| Custom corpus (1,000 docs) | 36.1s | 287.2s | 99.5% | 99.8% |
We're faster; they're more accurate. The 1.8% accuracy gap on the isartor test suite corresponds to 7 documents where we produce a false negative (we say conformant; the document isn't) or false positive (we flag a violation that isn't there). iText has one false result on this corpus.
For most production use cases, 98.1% accuracy is acceptable. For a workflow where you're certifying documents for legal archiving, that 1.9% matters — in that case, route the edge cases through iText's validation or use veraPDF directly.
ZUGFeRD / Factur-X
iText has good ZUGFeRD/Factur-X support via their pdfHTML and ZUGFeRD-specific API. PDFluent's pdf-invoice crate covers the same ground.
| PDFluent | iText 9 | |
|---|---|---|
| Generate ZUGFeRD EN16931 | 14ms | 48ms |
| Validate ZUGFeRD embedding | 8ms | 22ms |
| Extract XML from Factur-X PDF | 6ms | 18ms |
| Correctness (schema + schematron) | 100% | 100% |
Reproducing these benchmarks
The benchmark code is on GitHub. To run it yourself:
# Public benchmark corpus is not yet open-source.
# Mail [email protected] for the methodology and corpus access.
# Once published, the snippet below is the entry point:
cd benchmarks
# Install dependencies
cargo build --release
mvn -f itext-bench/pom.xml package
# Generate test corpus (requires ~2GB disk)
./scripts/generate-corpus.sh
# Run benchmarks
./scripts/run-all.sh --output results.json
# View results
./scripts/report.py results.jsonThe corpus generator creates PDFs using a mix of open-source tools (Ghostscript, LibreOffice, LaTeX) so the test set is reproducible. If you get substantially different results on your hardware, open an issue — we want to know.
Summary
| Category | Winner | Notes |
|---|---|---|
| Cold start | PDFluent | ~40× faster on standard JDK; ~6× on GraalVM Native Image |
| Parse throughput | PDFluent | ~3.4× faster sequential, ~3.2× parallel |
| Memory usage | PDFluent | ~4× less RAM per document |
| Text extraction quality | iText | Better multi-column / complex layout handling |
| PDF/A accuracy | iText | Co-developed with veraPDF; marginally more accurate |
| PDF/A speed | PDFluent | ~8× faster validation |
| PDF rendering | PDFluent | iText has no rendering engine |
| XFA processing | PDFluent | iText only flattens via pdfXFA add-on |
| ZUGFeRD/Factur-X | PDFluent | Both correct; PDFluent ~3× faster |
| Java/JVM integration | iText | 25 years of ecosystem; Maven-native |
When to use PDFluent: You need fast cold starts (serverless), low memory footprint (high concurrency), PDF rendering, XFA forms, WebAssembly deployment, or non-JVM language bindings.