PDF Processing in the Browser with WebAssembly

Most PDF SDKs assume server-side processing. You send the file to your server, the server processes it, you send the result back. That model has three costs: latency (the round trip), infrastructure (a server that scales with load), and privacy (the file leaves the user's machine).

WebAssembly changes this. A PDF SDK compiled to WASM runs directly in the browser — no server, no upload, no round trip. PDFluent's WASM bundle is ~3MB compressed (9.8MB raw). It handles parsing, rendering, XFA forms, PDF/A validation, and text extraction entirely client-side.

This tutorial walks through a real React integration: render a PDF page, extract text, validate PDF/A compliance, and flatten an XFA form — all in the browser.

Why WASM works for PDF processing

The prerequisite for compiling to WASM is having zero native dependencies. A C++ PDF library typically links against libpng, libjpeg, zlib, freetype, ICU, and half a dozen other system libraries. None of those are available in the WASM sandbox.

PDFluent is pure Rust. All image codecs, font rendering, compression, and internationalization are Rust crates that compile to WASM. The only thing we pull from the browser is a monotonic timer (for profiling) and a random seed (for unique IDs).

Installation

bash

npm install @pdfluent/sdk-wasm

The package includes the WASM binary (pdfluent_bg.wasm), the generated JS glue layer, and TypeScript type definitions. The binary is loaded lazily on first use; you control when initialization happens.

Setup in React

Create a hook that initializes the WASM module once and makes it available to your components:

typescript

// hooks/usePdfluent.ts
import { useState, useEffect } from 'react';
import init, { PdfluentWasm } from '@pdfluent/sdk-wasm';

let initialized = false;

export function usePdfluent() {
  const [ready, setReady] = useState(initialized);

  useEffect(() => {
    if (initialized) return;
    init().then(() => {
      initialized = true;
      setReady(true);
    });
  }, []);

  return ready ? PdfluentWasm : null;
}

Calling init() fetches and compiles the WASM binary. This happens once — subsequent calls return immediately. The hook re-renders when initialization completes.

Vite / webpack config: WASM files need to be served with Content-Type: application/wasm for streaming compilation. In Vite, add assetsInclude: ['**/*.wasm'] to your config. In webpack 5, experiments: { asyncWebAssembly: true }.

Rendering a PDF page

The most common operation: take a PDF file, render a specific page to a canvas element.

typescript

// components/PdfViewer.tsx
import React, { useEffect, useRef, useState } from 'react';
import { usePdfluent } from '../hooks/usePdfluent';

interface PdfViewerProps {
  file: File;
  page?: number;
  dpi?: number;
}

export function PdfViewer({ file, page = 0, dpi = 150 }: PdfViewerProps) {
  const Pdf = usePdfluent();
  const canvasRef = useRef<HTMLCanvasElement>(null);
  const [error, setError] = useState<string | null>(null);

  useEffect(() => {
    if (!Pdf || !canvasRef.current) return;

    file.arrayBuffer().then((buf) => {
      try {
        const doc = Pdf.Document.fromBytes(new Uint8Array(buf));
        const pageObj = doc.page(page);
        const bitmap = pageObj.render(dpi);

        const canvas = canvasRef.current!;
        canvas.width = bitmap.width;
        canvas.height = bitmap.height;

        const ctx = canvas.getContext('2d')!;
        const imageData = new ImageData(
          new Uint8ClampedArray(bitmap.rgba()),
          bitmap.width,
          bitmap.height,
        );
        ctx.putImageData(imageData, 0, 0);

        doc.free();
      } catch (e) {
        setError(String(e));
      }
    });
  }, [Pdf, file, page, dpi]);

  if (error) return <div className="text-red-400 text-sm">{error}</div>;
  return <canvas ref={canvasRef} className="max-w-full" />;
}

A few things to note: bitmap.rgba() returns a Uint8Array of raw RGBA bytes — no additional decoding needed. doc.free() is explicit memory management; WASM memory is not garbage collected, so you should free documents when done.

Text extraction

typescript

async function extractText(file: File): Promise<string> {
  const Pdf = await initPdfluent(); // one-shot init
  const buf = await file.arrayBuffer();
  const doc = Pdf.Document.fromBytes(new Uint8Array(buf));

  const pages = [];
  for (let i = 0; i < doc.pageCount(); i++) {
    const page = doc.page(i);
    pages.push(page.extractText({ preserveLayout: true }));
    page.free();
  }

  doc.free();
  return pages.join('

--- page break ---

');
}

preserveLayout: true attempts to reconstruct the reading order based on text block positions. Without it, you get raw character streams in PDF drawing order, which is often wrong for multi-column documents.

PDF/A validation

typescript

async function validatePdfa(file: File) {
  const Pdf = await initPdfluent();
  const buf = await file.arrayBuffer();
  const doc = Pdf.Document.fromBytes(new Uint8Array(buf));

  const result = doc.validatePdfa({
    conformance: 'PDF/A-2b', // or '1b', '3b', '4'
  });

  doc.free();

  return {
    isConformant: result.isConformant(),
    failures: result.failures().map(f => ({
      rule: f.rule(),       // e.g. "§6.2.11.5:1"
      message: f.message(),
      page: f.page(),       // null if not page-specific
    })),
  };
}

XFA form flattening

Flattening converts an interactive XFA form into a static PDF. The form data is baked in, the XFA stream is removed, and the result is a plain PDF that any reader can display.

typescript

async function flattenXfa(file: File): Promise<Uint8Array> {
  const Pdf = await initPdfluent();
  const buf = await file.arrayBuffer();
  const doc = Pdf.Document.fromBytes(new Uint8Array(buf));

  if (!doc.hasXfa()) {
    doc.free();
    throw new Error('Document does not contain an XFA form');
  }

  const xfa = doc.xfa();
  const flat = xfa.flatten();
  const output = flat.toBytes();

  flat.free();
  xfa.free();
  doc.free();

  return output; // Uint8Array — save to disk or upload
}

// Usage: trigger download in the browser
async function downloadFlattened(file: File) {
  const bytes = await flattenXfa(file);
  const blob = new Blob([bytes], { type: 'application/pdf' });
  const url = URL.createObjectURL(blob);
  const a = document.createElement('a');
  a.href = url;
  a.download = 'flattened.pdf';
  a.click();
  URL.revokeObjectURL(url);
}

Putting it together: a file drop component

typescript

// components/PdfDropzone.tsx
import React, { useCallback, useState } from 'react';
import { PdfViewer } from './PdfViewer';

export function PdfDropzone() {
  const [file, setFile] = useState<File | null>(null);
  const [isXfa, setIsXfa] = useState(false);

  const onDrop = useCallback(async (e: React.DragEvent) => {
    e.preventDefault();
    const dropped = e.dataTransfer.files[0];
    if (!dropped?.name.endsWith('.pdf')) return;

    setFile(dropped);

    // Peek at the file to check for XFA
    const Pdf = await initPdfluent();
    const buf = await dropped.arrayBuffer();
    const doc = Pdf.Document.fromBytes(new Uint8Array(buf));
    setIsXfa(doc.hasXfa());
    doc.free();
  }, []);

  return (
    <div
      onDrop={onDrop}
      onDragOver={e => e.preventDefault()}
      className="border-2 border-dashed border-gray-300 rounded-lg p-8"
    >
      {file ? (
        <>
          {isXfa && (
            <button onClick={() => flattenXfa(file).then(download)}>
              Flatten XFA form
            </button>
          )}
          <PdfViewer file={file} dpi={150} />
        </>
      ) : (
        <p>Drop a PDF here</p>
      )}
    </div>
  );
}

Performance: what WASM can and can't do

A common question: is WASM fast enough for real workloads?

Operation	WASM (browser)	Native (server)	Ratio
Render page at 150 DPI	45–90ms	20–45ms	~2×
Text extraction (10 pages)	80–150ms	40–80ms	~2×
PDF/A validation (simple)	30–60ms	15–35ms	~2×
XFA flatten (static form)	120–250ms	60–120ms	~2×
PDF parse only	5–15ms	2–8ms	~2×

WASM runs at roughly half the speed of native code for CPU-bound tasks. That's fast enough for most interactive use cases. Rendering a single page in 90ms is imperceptible. Validating a 100-page document in 3 seconds is acceptable for a one-off operation.

Where WASM falls short: batch processing. If you're converting 1,000 PDFs per minute, run that on a server. WASM is for the interactive, per-user operations where client-side execution gives you privacy, offline capability, and zero server costs.

Memory limits

WASM starts with 16MB of memory and can grow to 4GB (though in practice most browsers cap this at 2GB). A 10-page PDF with embedded fonts and images might use 50–100MB during rendering. A 200-page document with embedded images could use 500MB–1GB.

For large documents, use page-at-a-time rendering rather than loading the entire document into memory at once. The API supports this via doc.page(i) — you can render each page, write the result to a canvas, then release the page before moving to the next.

Multi-threading

WASM threads require SharedArrayBuffer, which requires a cross-origin isolated context (COOP and COEP headers). If you have that configured, you can use Pdf.setThreadCount(4) to enable parallel rendering. Without it, processing is single-threaded.

Bundle size breakdown

The 9.8MB WASM binary (after Brotli: ~2.7MB, gzip: ~3.5MB) breaks down roughly as:

—PDF parser and object model: ~1.2 MB
—Font rendering (FreeType-equivalent in Rust): ~1.6 MB
—Image codecs (JPEG, PNG, JBIG2, CCITT): ~1.3 MB
—XFA engine (DOM resolver, FormCalc, layout): ~1.5 MB
—Compression (zlib, LZW, ASCII85): ~0.6 MB
—PDF/A validator: ~0.5 MB

If you only need rendering (no XFA, no PDF/A validator), you can use the @pdfluent/sdk-wasm package for a smaller bundle.

The WASM binary is cacheable. On first load, the browser compiles it to native code and caches the compilation result. Subsequent page loads use the cached native binary — no recompilation. For most users, the ~3MB download only happens once.

Deploying

If you're serving the WASM binary from the same origin as your app, no special configuration is needed. If you're serving from a CDN, ensure the response includes:

http

Content-Type: application/wasm
Cross-Origin-Resource-Policy: cross-origin

For offline support, add the WASM binary to your service worker's precache list. Once cached, PDF processing works with no network connection at all — which is useful for document review apps, field data collection, and anywhere with unreliable connectivity.