Migration guides/legacy PDF stack

Migrate from a legacy PDF stack

Many teams accumulate a mix of old tools: PDFBox for reading, iText for writing, custom scripts for forms, and shell wrappers around Ghostscript. Consolidate to one Rust crate.

Migrating from legacy PDF stack to PDFluent. Install with cargo add pdfluent

Migration steps

1

Audit your current PDF toolchain

Before replacing anything, list every library and tool touching PDFs in your system. Common legacy stacks include PDFBox for reading, iText for writing, a separate form library, and Ghostscript invoked via Runtime.exec() or shell scripts. Each one has its own dependency, license, and failure mode.

legacy PDF stack (before)
# Typical legacy PDF stack
# pom.xml: iText 5 (AGPL), PDFBox 2.0 (Apache)
# scripts/fill_form.sh: calls pdftk or ghostscript
# lib/pdf_util.py: Python wrapper around pdfminer
# DockerFile: installs ghostscript, pdftk, java, python
#
# Result: 4 runtimes, 3 licenses, 500+ MB image
PDFluent (after)
# PDFluent replaces all of the above
# Cargo.toml: pdfluent = "0.9"  (MIT/commercial)
#
# One crate for: reading, writing, text extraction,
# form filling, XFA, annotations, and metadata.
# Dockerfile: copies a single static binary.
2

Replace reading and text extraction first

Start with the lowest-risk part of the stack: reading documents and extracting text. This is typically handled by PDFBox or pdfminer and is safe to swap without changing any downstream logic. Verify output parity before moving to the next operation.

legacy PDF stack (before)
// PDFBox text extraction
PDDocument document = PDDocument.load(new File("report.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(document.getNumberOfPages());
String text = stripper.getText(document);
document.close();
PDFluent (after)
use pdfluent::Document;

let doc = Document::open("report.pdf")?;
let pages = doc.page_count();
let mut all_text = String::new();
for i in 0..pages {
    all_text.push_str(&doc.page(i)?.extract_text()?);
}
3

Consolidate form filling and document writing

Legacy stacks often use a different library for writing than for reading. Replace iText or pdftk form-filling with PDFluent's acroform API. Replace Ghostscript shell invocations with PDFluent's document manipulation methods. Each replacement removes a runtime dependency from your Docker image.

legacy PDF stack (before)
// iText 5 form filling (AGPL)
PdfReader reader = new PdfReader("template.pdf");
PdfStamper stamper = new PdfStamper(
    reader,
    new FileOutputStream("filled.pdf")
);
AcroFields form = stamper.getAcroFields();
form.setField("company_name", "Acme Corp");
form.setField("invoice_date", "2024-04-14");
stamper.setFormFlattening(true);
stamper.close();
reader.close();
PDFluent (after)
let mut doc = Document::open("template.pdf")?;
let mut form = doc.acroform()?;
form.set_field("company_name", "Acme Corp")?;
form.set_field("invoice_date", "2024-04-14")?;
form.flatten()?;
doc.save("filled.pdf")?;

Things to watch out for

  • !iText 5 is AGPL-licensed. If your team has been quietly ignoring this, migration to PDFluent resolves the license risk.
  • !Shell wrappers around pdftk or Ghostscript are fragile — they depend on specific installed versions and break silently when the binary is missing. PDFluent eliminates all external process invocations.
  • !If you are using PDFBox 2.x, note that PDFBox 3.0 changed several APIs. Rather than upgrading PDFBox, migrating to PDFluent at this point may be less work.
  • !Migrate one operation type at a time and run both the old and new code in parallel on the same documents to verify output parity before decommissioning the old tool.

Frequently asked questions