How-to guides/E-Invoicing

Extract ZUGFeRD or Factur-X XML from a PDF in Rust

Read the structured invoice XML embedded inside a ZUGFeRD or Factur-X PDF. Parse it for automated accounting import.

rust
use pdfluent::PdfDocument;

fn main() -> pdfluent::Result<()> {
    let doc = PdfDocument::open("invoice.pdf")?;

    if let Some(einvoice) = doc.extract_einvoice()? {
        println!("Profile: {:?}", einvoice.profile());
        println!("XML length: {} bytes", einvoice.xml().len());
        std::fs::write("extracted-invoice.xml", einvoice.xml())?;
    } else {
        println!("No e-invoice XML found in this PDF.");
    }

    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Add PDFluent to Cargo.toml

XML extraction works with the base crate. The einvoice feature adds profile detection and validation helpers.

rust
# Cargo.toml
[dependencies]
pdfluent = { version = "0.9", features = ["einvoice"] }
2

Check whether the PDF contains an e-invoice

Use has_einvoice() to quickly check before attempting extraction. This reads only the PDF attachment table.

rust
use pdfluent::PdfDocument;

let doc = PdfDocument::open("invoice.pdf")?;

if doc.has_einvoice() {
    println!("E-invoice XML found.");
} else {
    println!("No e-invoice embedded.");
}
3

Extract the XML and detect the profile

extract_einvoice() returns an EInvoiceData struct. The profile() method identifies ZUGFeRD MINIMUM, BASIC, EN16931, or EXTENDED.

rust
use pdfluent::EInvoiceProfile;

let einvoice = doc.extract_einvoice()?.unwrap();

println!("Profile: {:?}", einvoice.profile());

match einvoice.profile() {
    EInvoiceProfile::Minimum      => println!("Basic routing data only"),
    EInvoiceProfile::BasicWl      => println!("Line items without account data"),
    EInvoiceProfile::EN16931      => println!("Full invoice, EU compliant"),
    EInvoiceProfile::Extended     => println!("Extended German profile"),
    EInvoiceProfile::XRechnung    => println!("German public sector"),
    _ => {}
}
4

Parse the XML with quick-xml or roxmltree

The extracted XML is a standard Rust String. Use any XML parser to read the invoice fields.

rust
use roxmltree::Document as XmlDoc;

let xml_str = einvoice.xml();
let xml = XmlDoc::parse(&xml_str)?;

// Read invoice number
let invoice_id = xml
    .descendants()
    .find(|n| n.has_tag_name("ID") && n.parent().map_or(false, |p| p.has_tag_name("ExchangedDocument")))
    .and_then(|n| n.text());

println!("Invoice ID: {:?}", invoice_id);
5

Batch extract XML from a folder of PDFs

Combine with the batch processing pattern to extract XML from many invoices at once.

rust
use std::fs;
use pdfluent::PdfDocument;

let dir = fs::read_dir("./invoices")?;

for entry in dir.filter_map(|e| e.ok()) {
    let path = entry.path();
    if path.extension().map_or(false, |e| e == "pdf") {
        let doc = PdfDocument::open(&path)?;
        if let Some(inv) = doc.extract_einvoice()? {
            let xml_path = path.with_extension("xml");
            fs::write(&xml_path, inv.xml())?;
            println!("Extracted: {}", xml_path.display());
        }
    }
}

Notes and tips

  • The XML attachment in Factur-X PDFs is always named factur-x.xml. In ZUGFeRD 1.0 PDFs, it may be named ZUGFeRD-invoice.xml. PDFluent checks both names.
  • PDF/A-3 attachments have an AFRelationship entry set to Alternative. PDFluent searches for this to locate the invoice XML.
  • If the PDF has multiple XML attachments, extract_einvoice() returns the first one matching a known e-invoice profile.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions