Migration guides/Apache PDFBox

Migrate from Apache PDFBox to PDFluent

A step-by-step guide for replacing Apache PDFBox with PDFluent. Covers dependency setup, document loading, text extraction, form filling, and saving.

Migrating from Apache PDFBox to PDFluent. Install with cargo add pdfluent

Migration steps

1

Replace the dependency

Remove PDFBox from pom.xml or build.gradle and add pdfluent to Cargo.toml.

Apache PDFBox (before)
<!-- pom.xml -->
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>3.0.1</version>
</dependency>
PDFluent (after)
# Cargo.toml
[dependencies]
pdfluent = "0.9"
2

Open a document

PDFBox uses PDDocument.load() with a File or byte array. PDFluent uses Document::open which returns a Result.

Apache PDFBox (before)
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;

PDDocument doc = PDDocument.load(new File("contract.pdf"));
PDFluent (after)
use pdfluent::Document;

let doc = Document::open("contract.pdf")?;
3

Extract text

PDFBox requires a PDFTextStripper instance and produces a single string for the whole document. PDFluent extracts per-page.

Apache PDFBox (before)
import org.apache.pdfbox.text.PDFTextStripper;

PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(1);
String text = stripper.getText(doc);
PDFluent (after)
let text = doc.page(0)?.extract_text()?;

// All pages
for i in 0..doc.page_count() {
    let text = doc.page(i)?.extract_text()?;
    println!("{}", text);
}
4

Fill AcroForm fields

PDFBox accesses fields through PDDocumentCatalog and PDAcroForm. PDFluent uses a direct acroform() handle.

Apache PDFBox (before)
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;

PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
acroForm.getField("first_name").setValue("Jane");
acroForm.getField("last_name").setValue("Smith");
acroForm.flatten();
PDFluent (after)
let mut form = doc.acroform()?;
form.set_field("first_name", "Jane")?;
form.set_field("last_name", "Smith")?;
form.flatten()?;
5

Save and close

PDFBox requires explicit close(). PDFluent drops the document when it goes out of scope; call save() to write.

Apache PDFBox (before)
doc.save("output.pdf");
doc.close(); // must call close() to release file handles
PDFluent (after)
doc.save("output.pdf")?;
// doc drops automatically at end of scope

Things to watch out for

  • !PDFBox page numbers are 1-indexed in PDFTextStripper but 0-indexed in PDPageTree. PDFluent always uses 0-indexed.
  • !PDFBox does not support XFA forms. If your PDFs use XFA, PDFluent handles them natively.
  • !PDFBox PDDocument must be closed explicitly or resource leaks occur. PDFluent documents drop cleanly with Rust ownership.

Frequently asked questions