How-to guides/XFA Forms

Extract data from an XFA PDF form in Rust

Read the XFA data packet from a dynamic XFA form and access field values as typed Rust data.

rust
use pdfluent::Document;

fn main() -> pdfluent::Result<()> {
    let doc = Document::open("form.pdf")?;

    let xfa = doc.xfa().ok_or(pdfluent::Error::NoXfa)?;
    let datasets = xfa.datasets()?;

    // Read a field value by its fully-qualified name
    let first_name = datasets.field_value("form1.subform1.firstName")?;
    println!("First name: {}", first_name.as_str().unwrap_or(""));

    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Open the PDF and access the XFA root

doc.xfa() returns an Option<XfaDocument>. If the PDF does not contain an XFA structure, it returns None.

rust
let doc = Document::open("form.pdf")?;
let xfa = doc.xfa().ok_or(pdfluent::Error::NoXfa)?;
2

Access the datasets packet

XFA forms store submitted data in the xfa:datasets XML packet. xfa.datasets() parses that XML into a queryable tree.

rust
let datasets = xfa.datasets()?;
3

Read a field value by name

Field names are dot-separated paths from the root node. The path mirrors the XFA form template hierarchy.

rust
let val = datasets.field_value("form1.subform1.firstName")?;
match val {
    pdfluent::xfa::FieldValue::Str(s)  => println!("string: {}", s),
    pdfluent::xfa::FieldValue::Date(d) => println!("date: {:?}", d),
    pdfluent::xfa::FieldValue::Num(n)  => println!("number: {}", n),
    pdfluent::xfa::FieldValue::Empty   => println!("(empty)"),
}
4

Iterate all field nodes

Use datasets.fields() to get every leaf node in the data tree.

rust
for field in datasets.fields() {
    println!("{} = {:?}", field.path(), field.value());
}
5

Export the raw datasets XML

If you need the raw XML for custom processing, access the bytes directly.

rust
let xml_bytes = datasets.to_xml_bytes()?;
std::fs::write("form_data.xml", &xml_bytes)?;

Notes and tips

  • XFA forms come in two variants: static XFA (fixed layout) and dynamic XFA (auto-layout). Both use the same data model. PDFluent parses both.
  • The XFA template and datasets are separate XML streams. Modifying datasets without updating the template rendering may produce inconsistent results.
  • Adobe Reader is the primary renderer for dynamic XFA. Most other viewers (Foxit, Chrome PDF) do not support dynamic XFA fully.
  • XFA is deprecated in PDF 2.0. New forms should use AcroForm instead. PDFluent supports both for reading existing documents.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions