How-to guides/Cloud & Serverless

Run PDF processing on AWS Lambda with Rust

PDFluent has no native dependencies, which makes it ideal for Lambda. Cold start times are under 50 ms for a typical PDF handler.

rust
use lambda_runtime::{service_fn, LambdaEvent, Error};
use aws_sdk_s3::Client as S3Client;
use pdfluent::PdfDocument;
use serde_json::Value;

#[tokio::main]
async fn main() -> Result<(), Error> {
    lambda_runtime::run(service_fn(handler)).await
}

async fn handler(event: LambdaEvent<Value>) -> Result<Value, Error> {
    let bucket = event.payload["bucket"].as_str().unwrap_or_default();
    let key    = event.payload["key"].as_str().unwrap_or_default();

    let config = aws_config::load_from_env().await;
    let s3 = S3Client::new(&config);

    let resp = s3.get_object().bucket(bucket).key(key).send().await?;
    let bytes = resp.body.collect().await?.into_bytes();

    let doc = PdfDocument::from_bytes(&bytes)?;
    let page_count = doc.page_count();
    let text = doc.extract_text()?;

    Ok(serde_json::json!({
        "pages": page_count,
        "chars": text.len()
    }))
}
Install:cargo add pdfluentDownload SDK →

Step by step

1

Set up your Cargo project for Lambda

Use the lambda_runtime crate from AWS. Build a binary named bootstrap, which is the required name for Lambda custom runtimes.

rust
# Cargo.toml
[package]
name = "pdf-lambda"
version = "0.1.0"
edition = "2021"

[[bin]]
name = "bootstrap"
path = "src/main.rs"

[dependencies]
pdfluent = "0.9"
lambda_runtime = "0.11"
aws-config = { version = "1", features = ["behavior-version-latest"] }
aws-sdk-s3 = "1"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
2

Cross-compile for Amazon Linux

Lambda runs on Amazon Linux 2 (x86_64 or arm64). Use cargo-lambda to cross-compile without a Linux machine.

rust
# Install cargo-lambda
cargo install cargo-lambda

# Build for x86_64 Lambda
cargo lambda build --release --target x86_64-unknown-linux-musl

# Or build for arm64 Lambda (Graviton2, cheaper)
cargo lambda build --release --target aarch64-unknown-linux-musl
3

Write the Lambda handler

Read the S3 bucket and key from the event payload. Download the PDF bytes from S3 and pass them to PdfDocument::from_bytes.

rust
async fn handler(event: LambdaEvent<Value>) -> Result<Value, Error> {
    let bucket = event.payload["bucket"].as_str().unwrap_or_default();
    let key    = event.payload["key"].as_str().unwrap_or_default();

    let config = aws_config::load_from_env().await;
    let s3 = S3Client::new(&config);

    let resp = s3.get_object()
        .bucket(bucket)
        .key(key)
        .send()
        .await?;

    let bytes = resp.body.collect().await?.into_bytes();
    let doc = PdfDocument::from_bytes(&bytes)?;
    // ... process the document
    Ok(serde_json::json!({ "pages": doc.page_count() }))
}
4

Deploy with cargo-lambda

cargo lambda deploy uploads the binary as a Lambda function with the provided.al2 runtime.

rust
cargo lambda deploy \
  --region eu-west-1 \
  --memory 512 \
  --timeout 30 \
  pdf-lambda
5

Set the Lambda execution role

The function needs GetObject permission on the S3 bucket. Attach an inline policy or a managed policy to the execution role.

rust
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::my-pdf-bucket/*"
    }
  ]
}

Notes and tips

  • PDFluent links statically via musl. The final bootstrap binary is about 4-6 MB, well under the Lambda 50 MB zipped limit.
  • Set Lambda memory to at least 256 MB for small PDFs. Large PDFs (100+ pages) may need 512-1024 MB.
  • Lambda /tmp storage is 512 MB by default (up to 10 GB configurable). Write temporary output there if you need to save before uploading to S3.
  • Cold start for a musl-linked Rust binary is typically 20-50 ms, significantly faster than JVM or Python-based PDF libraries.

Why PDFluent for this

Pure Rust

No JVM, no runtime, no DLL dependencies. Ships as a single native binary or WASM module.

Memory safe

Rust's ownership model prevents buffer overflows and use-after-free. No segfaults in PDF parsing.

Runs anywhere

Same code runs server-side, in Docker, on AWS Lambda, on Cloudflare Workers, or in the browser via WASM.

Frequently asked questions