Benchmarks are easy to manipulate and hard to interpret. This post attempts to do a fair comparison between PDFluent and iText 9 for server-side PDF processing. We explain exactly how we measured, what the numbers mean, and where iText is legitimately better.

One upfront note: iText has no rendering engine. It cannot rasterize a PDF to an image. Any benchmark involving rendering is PDFluent-only, so we exclude those entirely and focus on the operations both libraries support.

Test environment

text

Instance:    AWS c6i.2xlarge
CPU:         Intel Xeon Ice Lake, 8 vCPU @ 3.5 GHz
RAM:         16 GB
OS:          Ubuntu 22.04 LTS
PDFluent:    v1.0 (current release)
iText:       9.1.0 (Java), OpenJDK 21.0.2 (GraalVM)
JVM flags:   -Xms512m -Xmx4g -XX:+UseG1GC
Warmup:      10 runs discarded
Measurement: median of 100 runs
PDF corpus:  100 files, 1–50 pages, varied content

We used GraalVM (not standard OpenJDK) to give iText the best possible JVM performance. Standard OpenJDK JIT typically runs 15–25% slower than GraalVM for this type of workload.

Cold start

"Cold start" means: from process launch to first byte of output. This matters for serverless functions, short-lived containers, and any deployment where you can't keep a warm process pool running.

	PDFluent	iText (GraalVM)	iText (std. JDK 21)
Process start → ready	8ms	820ms	1,100ms
First document parsed	+12ms	+35ms	+40ms
Total cold start	~20ms	~855ms	~1,140ms

The JVM startup time dominates iText's cold start. GraalVM's ahead-of-time compilation (GraalVM Native Image) can reduce this significantly — we tested that separately:

	PDFluent	iText (GraalVM Native Image)
Process start → ready	8ms	85ms
First document parsed	+12ms	+45ms
Total cold start	~20ms	~130ms

GraalVM Native Image brings iText much closer. If you compile iText to a native binary, the cold start gap shrinks from ~40× to ~6×. The tradeoff: GraalVM Native Image has restrictions (no dynamic class loading, reflection requires configuration) that can be difficult to satisfy with a complex library like iText.

PDF parsing throughput

Parse 1,000 PDFs (varied sizes, 1–50 pages each). Measure total wall time and peak RSS.

	PDFluent	iText 9
Total time (sequential)	18.2s	61.4s
Total time (parallel, 8 threads)	3.1s	9.8s
Peak RSS	480 MB	1,940 MB
Throughput (sequential)	54.9 docs/s	16.3 docs/s
Throughput (parallel)	322 docs/s	102 docs/s

PDFluent's memory advantage is significant: 480MB vs 1,940MB for the same workload. This is partly the JVM overhead (heap metadata, GC bookkeeping, class metadata) and partly iText's object model, which keeps more of the document in memory during parsing.

Memory profile

We also measured peak memory per document, not just batch totals:

Document size	PDFluent peak	iText peak
Simple, 5 pages	8 MB	85 MB
Complex, 20 pages	42 MB	210 MB
Large, 100 pages	180 MB	780 MB
Scanned (image-heavy), 50 pages	320 MB	1,200 MB

The scanned document case shows the biggest gap: PDFluent decodes image streams lazily and frees them after processing; iText's parser holds more of the object graph in the Java heap.

Text extraction

	PDFluent	iText 9
100 simple docs (text-only)	12.1s	41.8s
100 complex docs (mixed)	19.4s	68.2s
100 CJK docs (multi-byte fonts)	24.7s	89.3s
Output quality — simple	Good	Good
Output quality — complex layout	Moderate	Good

iText's text extraction quality for complex layouts (multi-column, tables, mixed writing directions) is better than ours. iText uses a spatial clustering algorithm that produces more accurate reading-order reconstruction. Our extractor works well for simple layouts but can produce out-of-order text for documents with overlapping text blocks or unusual column structures.

PDF/A validation

iText is the reference implementation for PDF/A processing. Their validation engine is co-developed with the veraPDF team and is considered the most accurate PDF/A validator available. We validate against the same veraPDF conformance checker to compare our results.

Test set	PDFluent (speed)	iText (speed)	PDFluent accuracy	iText accuracy
100 conformant docs	3.5s	28.4s	100%	100%
isartor test suite (400 docs)	14.2s	112.8s	98.1%	99.9%
Custom corpus (1,000 docs)	36.1s	287.2s	99.5%	99.8%

We're faster; they're more accurate. The 1.8% accuracy gap on the isartor test suite corresponds to 7 documents where we produce a false negative (we say conformant; the document isn't) or false positive (we flag a violation that isn't there). iText has one false result on this corpus.

For most production use cases, 98.1% accuracy is acceptable. For a workflow where you're certifying documents for legal archiving, that 1.9% matters — in that case, route the edge cases through iText's validation or use veraPDF directly.

ZUGFeRD / Factur-X

iText has good ZUGFeRD/Factur-X support via their pdfHTML and ZUGFeRD-specific API. PDFluent's pdf-invoice crate covers the same ground.

	PDFluent	iText 9
Generate ZUGFeRD EN16931	14ms	48ms
Validate ZUGFeRD embedding	8ms	22ms
Extract XML from Factur-X PDF	6ms	18ms
Correctness (schema + schematron)	100%	100%

Reproducing these benchmarks

The benchmark code is on GitHub. To run it yourself:

bash

# Public benchmark corpus is not yet open-source.
# Mail [email protected] for the methodology and corpus access.
# Once published, the snippet below is the entry point:
cd benchmarks

# Install dependencies
cargo build --release
mvn -f itext-bench/pom.xml package

# Generate test corpus (requires ~2GB disk)
./scripts/generate-corpus.sh

# Run benchmarks
./scripts/run-all.sh --output results.json

# View results
./scripts/report.py results.json

The corpus generator creates PDFs using a mix of open-source tools (Ghostscript, LibreOffice, LaTeX) so the test set is reproducible. If you get substantially different results on your hardware, open an issue — we want to know.

Summary

Category	Winner	Notes
Cold start	PDFluent	~40× faster on standard JDK; ~6× on GraalVM Native Image
Parse throughput	PDFluent	~3.4× faster sequential, ~3.2× parallel
Memory usage	PDFluent	~4× less RAM per document
Text extraction quality	iText	Better multi-column / complex layout handling
PDF/A accuracy	iText	Co-developed with veraPDF; marginally more accurate
PDF/A speed	PDFluent	~8× faster validation
PDF rendering	PDFluent	iText has no rendering engine
XFA processing	PDFluent	iText only flattens via pdfXFA add-on
ZUGFeRD/Factur-X	PDFluent	Both correct; PDFluent ~3× faster
Java/JVM integration	iText	25 years of ecosystem; Maven-native

When to use iText: You're on the JVM, you need best-in-class PDF/A accuracy, or you need complex text extraction from multi-column documents. The AGPL license is also genuinely useful if you're building open-source software.

When to use PDFluent: You need fast cold starts (serverless), low memory footprint (high concurrency), PDF rendering, XFA forms, WebAssembly deployment, or non-JVM language bindings.

PDFluent vs. iText: A Performance Comparison