What causes PDF corruption?

Common causes: incomplete file transfer, storage device errors, partial overwrite, software crash during save, and antivirus quarantine that truncates files.

Can PDFluent recover a PDF with a broken encryption dictionary?

If the encryption dictionary itself is damaged, decryption is not possible and content streams cannot be decoded. PDFluent will report the objects as unrecoverable.

Is the recovered file guaranteed to be valid PDF?

PDFluent writes a structurally valid file. However, missing fonts, images, or pages from the original will remain absent. Run validate_pdf_a() to check conformance after recovery.

How do I recover from a file with duplicate object numbers?

Recovery mode keeps the last occurrence of each object by default. Pass OpenOptions::recovery_keep_first(true) to prefer earlier objects instead.

PDFluentSDK

← Editor Download

How-to guides/Error Recovery

Attempt to recover and repair a corrupted PDF in Rust

Use PDFluent's recovery parser to rebuild the cross-reference table and salvage as many objects as possible from a damaged file.

rust

use pdfluent::{PdfDocument, OpenOptions};

fn main() -> pdfluent::Result<()> {
    let doc = PdfDocument::open_with(
        "corrupted.pdf",
        OpenOptions::new().recovery_mode(true),
    )?;

    let report = doc.repair_report();
    println!("Objects recovered: {}", report.objects_recovered());
    println!("Xref rebuilt:      {}", report.xref_rebuilt());

    doc.save("repaired.pdf")?;
    Ok(())
}

Install:cargo add [email protected]Download SDK →

Step by step

Try to open with standard parsing first

Standard parsing is faster. Only fall back to recovery mode if the standard open returns an error.

rust

use pdfluent::{PdfDocument, Error};

let result = PdfDocument::open("damaged.pdf");
match result {
    Ok(doc) => println!("Opened normally."),
    Err(Error::BrokenXref | Error::UnexpectedEof | Error::InvalidStructure(_)) => {
        println!("Standard open failed, trying recovery mode...");
    }
    Err(e) => return Err(e),
}

Open in recovery mode

Recovery mode uses a linear scan of the file to find all PDF objects rather than relying on the cross-reference table.

rust

use pdfluent::{PdfDocument, OpenOptions};

let doc = PdfDocument::open_with(
    "damaged.pdf",
    OpenOptions::new().recovery_mode(true),
)?;

Read the repair report

The repair report describes what was found and what could not be recovered.

rust

let report = doc.repair_report();
println!("Objects recovered:  {}", report.objects_recovered());
println!("Objects missing:    {}", report.objects_missing());
println!("Xref rebuilt:       {}", report.xref_rebuilt());
println!("Truncated at byte:  {:?}", report.truncated_at());

Verify page count and content

Check that the expected pages are present. Some pages may be unrecoverable if their stream data was overwritten.

rust

println!("Pages recovered: {}", doc.page_count());
for (i, page) in doc.pages().enumerate() {
    let text = page.text().unwrap_or_default();
    println!("Page {}: {} chars", i + 1, text.len());
}

Save the recovered file

Write the repaired document. The output is a structurally valid PDF even if some content was lost.

rust

doc.save("repaired.pdf")?;
println!("Saved repaired.pdf");

Notes and tips

Recovery mode cannot reconstruct content streams that are physically absent or overwritten. It can only find objects that are present in the file bytes.
A truncated file (download cut short) is one of the most common corruption causes. Recovery mode handles truncation by treating the end-of-file as the end of the last recoverable object.
Encryption prevents recovery of encrypted streams without the password. If the file was encrypted before corruption, decrypt first if possible.
Recovery mode is significantly slower than standard parsing because it reads the entire file byte by byte.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions

Download PDFluent

Attempt to recover and repair a corrupted PDF in Rust

Step by step

Try to open with standard parsing first

Open in recovery mode

Read the repair report

Verify page count and content

Save the recovered file

Notes and tips

Why PDFluent for this

Frequently asked questions

Related guides