How-to guides/Error Recovery

Attempt to recover and repair a corrupted PDF in Rust

Use PDFluent's recovery parser to rebuild the cross-reference table and salvage as many objects as possible from a damaged file.

rust
use pdfluent::{Document, OpenOptions};

fn main() -> pdfluent::Result<()> {
    let doc = Document::open_with_options(
        "corrupted.pdf",
        OpenOptions::default().recovery_mode(true),
    )?;

    let report = doc.repair_report();
    println!("Objects recovered: {}", report.objects_recovered());
    println!("Xref rebuilt:      {}", report.xref_rebuilt());

    doc.save("repaired.pdf")?;
    Ok(())
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Try to open with standard parsing first

Standard parsing is faster. Only fall back to recovery mode if the standard open returns an error.

rust
use pdfluent::{Document, Error};

let result = Document::open("damaged.pdf");
match result {
    Ok(doc) => println!("Opened normally."),
    Err(Error::BrokenXref | Error::UnexpectedEof | Error::InvalidStructure(_)) => {
        println!("Standard open failed, trying recovery mode...");
    }
    Err(e) => return Err(e),
}
2

Open in recovery mode

Recovery mode uses a linear scan of the file to find all PDF objects rather than relying on the cross-reference table.

rust
use pdfluent::{Document, OpenOptions};

let doc = Document::open_with_options(
    "damaged.pdf",
    OpenOptions::default().recovery_mode(true),
)?;
3

Read the repair report

The repair report describes what was found and what could not be recovered.

rust
let report = doc.repair_report();
println!("Objects recovered:  {}", report.objects_recovered());
println!("Objects missing:    {}", report.objects_missing());
println!("Xref rebuilt:       {}", report.xref_rebuilt());
println!("Truncated at byte:  {:?}", report.truncated_at());
4

Verify page count and content

Check that the expected pages are present. Some pages may be unrecoverable if their stream data was overwritten.

rust
println!("Pages recovered: {}", doc.page_count());
for (i, page) in doc.pages().enumerate() {
    let text = page.extract_text().unwrap_or_default();
    println!("Page {}: {} chars", i + 1, text.len());
}
5

Save the recovered file

Write the repaired document. The output is a structurally valid PDF even if some content was lost.

rust
doc.save("repaired.pdf")?;
println!("Saved repaired.pdf");

Notes and tips

  • Recovery mode cannot reconstruct content streams that are physically absent or overwritten. It can only find objects that are present in the file bytes.
  • A truncated file (download cut short) is one of the most common corruption causes. Recovery mode handles truncation by treating the end-of-file as the end of the last recoverable object.
  • Encryption prevents recovery of encrypted streams without the password. If the file was encrypted before corruption, decrypt first if possible.
  • Recovery mode is significantly slower than standard parsing because it reads the entire file byte by byte.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions