PDF Security Blog

Batch PDF Verification at Scale: Async Queue Patterns for the HTPBE? API

HTPBE Team·15.06.2026·24 min read

This article is a snapshot — content was accurate as of June 2026 (code examples tested against the API as of June 2026). The product evolves actively; specific counts, examples, and detection rules may have changed since publication — see the changelog for the current state.

A single PDF check is a request/response call. A thousand of them is an architecture problem. The moment your platform processes a stream of documents — nightly batches of loan applications, a queue of insurance claims, every payslip uploaded during an onboarding window — the question stops being “how do I call the API” and becomes “how do I call it ten thousand times without melting either side of the connection.”

This guide is the scale companion to the Node.js integration guide, which covers the single-call surface, and the Next.js App Router guide, which covers per-request integration. Here the concern is throughput: wrapping the PDF tamper detection API in your own job runner, controlling concurrency, surviving 429 capacity signals, making each job idempotent, and routing verdicts (intact / modified / inconclusive) into the right downstream lane. Code below is end-to-end TypeScript using BullMQ and Redis — adapt it for your own ops profile (metrics, alerting, secrets, persistence), then map the same primitives onto SQS, Cloud Tasks, or any other queue.

The contract you are building on top of

Before designing a queue, fix one fact about the API: it is synchronous. POST /analyze blocks until the verdict is computed, then returns 201 Created with a check ID and a Location header pointing at the result. There is no job to poll on our side and no webhook callback to register — by the time analyze returns, the result is already retrievable from GET /result/{id}.

This matters for your design. You are not orchestrating an async external service with its own completion events. You are wrapping a plain blocking call — one that takes 2–5 seconds for a typical PDF, up to 15 for a complex one — inside your own asynchronous infrastructure. The queue, the workers, the retries, the backoff: all of that is yours. The API is just one slow synchronous step inside each job.

That framing keeps the architecture honest. Every pattern below exists to manage your concurrency against a blocking dependency, not to wait on an external job system that does not exist.

Why a naive loop fails

The first thing most teams write is a for loop over an array of URLs with await client.verify(url) inside it. It works for ten documents and falls over at ten thousand for three separate reasons.

It is serial. Each call blocks for several seconds. Ten thousand documents at four seconds each is over eleven hours of wall-clock time. You need controlled parallelism.

It has no recovery. One thrown error — a transient 500, a 429 capacity signal, a malformed PDF — and the loop either dies or silently skips. A batch job that loses documents is worse than one that runs slowly.

It ignores backpressure. Fire a hundred analyze calls in parallel with Promise.all and you will collect a wall of 429 SERVER_AT_CAPACITY responses. That status is server-wide concurrency, not per-key rate limiting — the server is telling you it is already running the maximum number of simultaneous analyses. Hammering it harder makes things worse.

A job queue solves all three: it parallelises with a fixed worker count, retries failed jobs with backoff, and lets you tune concurrency to match what the server will accept.

The queue topology

The design has four moving parts:

A producer that enqueues one job per document (with a deterministic job id for idempotency).
A Redis-backed queue (BullMQ) holding pending jobs.
A pool of workers with bounded concurrency, each running the synchronous analyze → result round-trip.
A verdict router that takes the result and pushes it into the correct downstream lane — auto-accept, an issuer re-request, or a human review queue. A structural verdict routes the case; it is never the sole basis for an automatic rejection.

producer ──> [ Redis queue ] ──> worker pool (concurrency = N)
                                       │
                                       ├─ analyze + result  (the API call)
                                       │
                                       └─> verdict router ──> accept / reject / review

Step 1: The job payload and the queue

Each job carries the document URL, an idempotency key from your own domain (the application id, claim id, candidate id — whatever the document belongs to), and the original filename so the audit trail is readable.

import { Queue } from 'bullmq';
import IORedis from 'ioredis';

export const connection = new IORedis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: null, // required by BullMQ workers
});

export interface VerifyJobData {
  /** The publicly reachable URL the API will fetch the PDF from. */
  pdfUrl: string;
  /** Your domain id — loan application, claim, candidate. Used as the job id. */
  entityId: string;
  /** Human-readable name, stored in the check's audit trail. */
  originalFilename: string;
}

export const verifyQueue = new Queue<VerifyJobData>('pdf-verify', {
  connection,
  defaultJobOptions: {
    attempts: 5,
    backoff: { type: 'exponential', delay: 2_000 },
    removeOnComplete: { age: 86_400 }, // keep 24h for inspection
    removeOnFail: false, // keep failures for triage
  },
});

Two defaults do the heavy lifting. attempts: 5 means a job that throws is retried up to four more times. backoff: exponential spaces those retries out at roughly 2s, 4s, 8s, 16s — which is exactly the behaviour you want when the server returns 429 or a transient 500.

Step 2: Idempotency — the part everyone skips

A batch job that runs twice — because a deploy restarted mid-run, because someone re-uploaded a CSV, because a network blip made the producer retry — must not double-charge you and must not create duplicate verdicts. Credits are real money here: every live-key call to analyze debits one credit from your pool. Verifying the same document twice burns two credits for one answer.

The cheapest defence is a deterministic job id. BullMQ refuses to enqueue a second job with an id that already exists, so re-running the producer is a no-op for documents already in flight.

import { verifyQueue, VerifyJobData } from './queue';
import { createHash } from 'node:crypto';

/** Hashes (entityId, pdfUrl). NOT a content hash — see tradeoff note below. */
export function deterministicJobId(data: VerifyJobData): string {
  return createHash('sha256').update(`${data.entityId}:${data.pdfUrl}`).digest('hex').slice(0, 32);
}

export async function enqueueDocument(data: VerifyJobData): Promise<void> {
  await verifyQueue.add('verify', data, { jobId: deterministicJobId(data) });
}

// Enqueue a whole batch — re-running this is safe.
// Chunked `addBulk` instead of `Promise.all`: a 50k-row import fires one Redis
// pipeline per chunk instead of 50k parallel ops, which is what saves Redis
// from a producer-side pressure spike.
export async function enqueueBatch(documents: VerifyJobData[]): Promise<void> {
  const CHUNK = 500;
  for (let i = 0; i < documents.length; i += CHUNK) {
    const slice = documents.slice(i, i + CHUNK);
    await verifyQueue.addBulk(
      slice.map((data) => ({
        name: 'verify',
        data,
        opts: { jobId: deterministicJobId(data) },
      }))
    );
  }
}

Honest tradeoff in the dedup key. This hashes entityId + pdfUrl, not the file’s actual bytes. Two cases that key cannot see:

The same physical document served from a rotating URL (signed S3 link with a fresh token on every retry, presigned upload reissue) — same file, different URL, two jobs, two credits debited.
A new file uploaded against the same entityId + pdfUrl — different bytes, same key, only the first analysis runs.

For the rotating-URL case, either canonicalise the URL before hashing (strip query params and tokens) or compute a content hash before enqueueing and use that in the key. For the “same URL, new file” case, version your entityId so the upload event mints a new id. The worker’s layer-two idempotency check below catches a subset of these — reuses an existing stored verdict on the same entity — but it is a backstop, not a substitute for a key that matches your real uniqueness contract.

Job-level de-duplication covers the producer side. The worker adds a second layer: before spending a credit, it checks whether this entity already has a stored verdict in your database. If the result endpoint already holds an answer for this document, reuse it. Store the HTPBE? check uid against your entity the moment a verdict lands, and you get a permanent, re-readable record — GET /result/{uid} returns the full forensic payload at any later date without re-analysing.

Step 3: The worker — concurrency and the API round-trip

The worker is where the synchronous API call lives. Its concurrency setting is the single most important dial in the whole system: too low and your batch crawls, too high and you collect 429s. Start conservative — the server runs a small fixed number of concurrent analyses, so a worker concurrency of 2–4 per process is a sane starting point. Raise it only if you are not seeing capacity signals.

import { Worker, Job, UnrecoverableError } from 'bullmq';
import { connection, VerifyJobData } from './queue';
import { analyzeDocument, HTPBEError } from './htpbe';
import { routeVerdict } from './router';

const WORKER_CONCURRENCY = Number(process.env.VERIFY_CONCURRENCY ?? 3);

export const worker = new Worker<VerifyJobData>(
  'pdf-verify',
  async (job: Job<VerifyJobData>) => {
    const { pdfUrl, entityId, originalFilename } = job.data;

    // Layer-two idempotency: skip if we already have a verdict on file.
    const existing = await lookupStoredVerdict(entityId);
    if (existing) return existing;

    let result;
    try {
      // The synchronous API round-trip: analyze -> result.
      result = await analyzeDocument(pdfUrl, originalFilename);
    } catch (err) {
      // Classify permanent input failures (422 invalid PDF, 413 too large) so
      // BullMQ skips the remaining retries instead of churning attempts. 402
      // is handled at the queue level in the `failed` listener below — it
      // pauses the whole queue, so it should still flow through normal retry
      // semantics here and surface as a failed event once attempts are done.
      if (err instanceof HTPBEError && !err.isTransient && err.code !== 'PAYMENT_REQUIRED') {
        throw new UnrecoverableError(`${err.code}: ${err.message}`);
      }
      throw err; // transient — let BullMQ retry with the configured backoff
    }

    // Persist the check uid so the full payload is re-readable forever.
    await storeVerdict(entityId, result.id, result.status, result.modification_markers);

    // Route into the correct downstream lane.
    await routeVerdict(entityId, result);

    return { checkId: result.id, status: result.status };
  },
  { connection, concurrency: WORKER_CONCURRENCY }
);

// Replace these two with your own persistence layer.
declare function lookupStoredVerdict(entityId: string): Promise<unknown | null>;
declare function storeVerdict(
  entityId: string,
  checkId: string,
  status: string,
  markers: string[]
): Promise<void>;

The worker function is deliberately thin: idempotency check, API call, persist, route. Transient errors propagate so BullMQ applies the backoff from Step 1; permanent input errors are re-thrown as UnrecoverableError, which is BullMQ’s explicit signal to skip the remaining attempts. UnrecoverableError is the right primitive here — calling moveToFailed from inside a failed listener is brittle (it races BullMQ’s own state machine and needs the job token, which is not always safe to access from outside the processor).

Step 4: The API call — honouring `Retry-After`

This is the function the worker calls. It runs the two-step synchronous flow and translates HTTP status codes into the right behaviour: throw-to-retry on transient failures, throw-to-fail-permanently on input errors, and a special case for 429 that reads the Retry-After header instead of relying on blind exponential backoff.

const BASE = 'https://api.htpbe.tech/v1';

export class HTPBEError extends Error {
  constructor(
    message: string,
    public readonly status: number,
    public readonly code: string,
    public readonly retryAfterMs?: number
  ) {
    super(message);
    this.name = 'HTPBEError';
  }
  /** Transient errors should be retried; permanent ones should not. */
  get isTransient(): boolean {
    return this.status === 429 || this.status >= 500;
  }
}

export interface HTPBEResult {
  id: string;
  status: 'intact' | 'modified' | 'inconclusive';
  status_reason?: string;
  modification_confidence: 'certain' | 'high' | 'none' | null;
  modification_markers: string[];
  producer: string | null;
}

const auth = () => ({ Authorization: `Bearer ${process.env.HTPBE_API_KEY}` });

export async function analyzeDocument(
  pdfUrl: string,
  originalFilename: string
): Promise<HTPBEResult> {
  // Step 1: submit. Synchronous — 201 + Location, the result is ready on return.
  const submit = await fetch(`${BASE}/analyze`, {
    method: 'POST',
    headers: { ...auth(), 'Content-Type': 'application/json' },
    body: JSON.stringify({ url: pdfUrl, original_filename: originalFilename }),
  });

  if (!submit.ok) throw await toError(submit);
  const { id } = (await submit.json()) as { id: string };

  // Step 2: retrieve the full flat result.
  const fetched = await fetch(`${BASE}/result/${id}`, { headers: auth() });
  if (!fetched.ok) throw await toError(fetched);

  return (await fetched.json()) as HTPBEResult;
}

async function toError(res: Response): Promise<HTPBEError> {
  let body: { error?: string; code?: string } = {};
  try {
    body = await res.json();
  } catch {
    /* non-JSON body */
  }

  if (res.status === 429) {
    // Server-wide capacity, NOT per-key throttling. Respect Retry-After.
    const retryAfterSec = Number(res.headers.get('Retry-After') ?? '5');
    return new HTPBEError(
      'Server at analysis capacity',
      429,
      'SERVER_AT_CAPACITY',
      retryAfterSec * 1_000
    );
  }

  if (res.status === 402) {
    // No credit source left: monthly quota + paid batch + welcome credits all
    // exhausted, or no active plan on a live key. Retrying NEVER helps.
    return new HTPBEError(
      'No credits available — top up at htpbe.tech/pricing',
      402,
      'PAYMENT_REQUIRED'
    );
  }

  return new HTPBEError(body.error ?? res.statusText, res.status, body.code ?? 'UNKNOWN');
}

Stopping the whole batch on `402`

A 402 PAYMENT_REQUIRED is categorically different from every other error. It does not mean this document failed — it means your credit pool is empty, and every remaining job in the batch will fail identically. Retrying it wastes time and floods your logs.

The correct behaviour is to pause the queue the instant a 402 appears, alert your billing owner, and resume once topped up. BullMQ makes this a one-liner inside the worker’s failure handler:

import { verifyQueue } from './queue';

worker.on('failed', async (job, err) => {
  if (err instanceof HTPBEError && err.code === 'PAYMENT_REQUIRED') {
    // The pool is dry — stop burning attempts on doomed jobs.
    await verifyQueue.pause();
    await alertBillingOwner('HTPBE credits exhausted — verification queue paused');
  }
  // Permanent input errors are already routed via UnrecoverableError thrown
  // from the processor (see Step 3) — BullMQ skips remaining attempts and
  // fires this listener once. Nothing left to do here for that class.
});

declare function alertBillingOwner(message: string): Promise<void>;

This is the practical payoff of the universal-credits model. A subscription’s monthly quota, any one-time top-up batches, and the welcome credits all draw from one pool, and every live-key call — whether it originates from a web upload or this worker — debits from it. A 402 is a single, unambiguous signal that the whole pool is dry. You handle it once, at the queue level, instead of per request. When you size a batch, size your plan or top-up to match: one credit per document, one pool for everything.

Custom backoff that reads `Retry-After`

BullMQ’s built-in exponential backoff is fine, but for 429 the server tells you exactly how long to wait. A custom backoff strategy lets you honour that header precisely instead of guessing.

import { Worker } from 'bullmq';

// Register a named strategy on the worker's queue settings.
export const worker429Aware = new Worker('pdf-verify', processor, {
  connection,
  concurrency: WORKER_CONCURRENCY,
  settings: {
    backoffStrategy: (attemptsMade: number, _type, err) => {
      if (err instanceof HTPBEError && err.retryAfterMs) {
        return err.retryAfterMs; // exact wait the server asked for
      }
      // Fall back to exponential for transient 5xx.
      return Math.min(2_000 * 2 ** (attemptsMade - 1), 30_000);
    },
  },
});

declare const processor: Parameters<typeof Worker>[1];

Set the job’s backoff to { type: 'custom' } and BullMQ calls this strategy with the thrown error in hand. A 429 waits precisely as long as the server requested; a 500 falls back to exponential. This is the polite, correct way to be a high-volume client — you never push harder than the server has told you it can take.

Step 5: Routing the three verdicts

Detection is only useful if the verdict lands somewhere that acts on it. The router takes a result and pushes it into one of three lanes. The single most common mistake at scale is collapsing inconclusive into a failure bucket — it is not a failure, it is a finding that the document was created with consumer software, an online editor, an HTML renderer, or a scanner, so there is no institutional original to verify integrity against.

import { HTPBEResult } from './htpbe';

export async function routeVerdict(entityId: string, result: HTPBEResult): Promise<void> {
  switch (result.status) {
    case 'intact':
      // No post-creation modification, institutional origin. Auto-advance.
      await advanceEntity(entityId, 'verified');
      return;

    case 'modified': {
      // Post-creation change detected. modification_confidence tells you how firmly to escalate.
      const action =
        result.modification_confidence === 'certain'
          ? 'request_reissue' // conclusive — hold and require a fresh issuer-delivered copy
          : 'manual_review'; // 'high' — strong but route to a human
      await advanceEntity(entityId, action, result.modification_markers);
      return;
    }

    case 'inconclusive':
      // NOT an error. For documents that CLAIM institutional origin
      // (a bank statement, a payslip), treat this like 'modified':
      // route to a reviewer. status_reason tells you why it is unverifiable.
      await advanceEntity(entityId, 'manual_review', [
        result.status_reason ?? 'consumer_software_origin',
      ]);
      return;
  }
}

declare function advanceEntity(
  entityId: string,
  state: 'verified' | 'request_reissue' | 'manual_review',
  markers?: string[]
): Promise<void>;

Branch on the stable HTPBE_* marker ids in modification_markers — for example HTPBE_SIGNATURE_REMOVED, HTPBE_DATES_DISAGREE, HTPBE_POST_SIGNATURE_EDIT — not on the human-readable labels. The ids are part of the public contract and never change once shipped; the labels live in the dictionary on how PDF tamper detection works. The modification_confidence field (certain vs high) is the lever that decides how firmly to escalate: certain markers cannot be false positives, so a conclusive finding warrants holding the document and requesting a fresh issuer-delivered copy, while high markers warrant a manual glance in unusual legitimate workflows like linearization or batch re-processing. Both lanes end at a human or an issuer re-request — every result also carries a usage_caution object as a reminder that a structural verdict is never the sole basis for an automatic rejection.

Step 6: Backpressure and the producer

One subtlety closes the loop. If your producer can enqueue documents faster than the workers drain them — a 50,000-row nightly import, say — Redis memory grows unbounded. BullMQ’s rate limiter caps how fast jobs start, which protects the API; pairing it with a bounded producer protects Redis.

export const verifyQueueLimited = new Queue('pdf-verify', {
  connection,
  // Across all workers, start at most 30 jobs per 10s — a global throttle
  // that keeps you comfortably under the server's concurrency ceiling.
});

// On the worker side, the limiter belongs in worker options:
// new Worker('pdf-verify', processor, {
//   connection,
//   concurrency: 3,
//   limiter: { max: 30, duration: 10_000 },
// });

The limiter is a belt-and-braces companion to worker concurrency. Concurrency bounds how many jobs run at once; the limiter bounds how many start per time window. Together they make it nearly impossible for a sudden batch to trigger a 429 storm — and on the rare occasion one slips through, the Retry-After-aware backoff from Step 4 absorbs it cleanly.

Step 7: Testing the whole pipeline without spending credits

Every plan, including free, ships a test key. Test keys accept a fixed set of mock URLs and return deterministic verdicts without debiting credits or downloading real files. Point your worker at them and you can exercise the entire queue — producer, concurrency, retry, routing — in CI for free.

import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import { enqueueDocument } from './producer';
import { worker } from './worker';

const TEST = 'https://api.htpbe.tech/v1/test';

describe('verification pipeline', () => {
  beforeAll(() => {
    process.env.HTPBE_API_KEY = process.env.HTPBE_TEST_API_KEY;
  });

  it('routes a modified document to rejection', async () => {
    const done = new Promise((resolve) => worker.once('completed', resolve));
    await enqueueDocument({
      pdfUrl: `${TEST}/modified-high.pdf`,
      entityId: 'loan-app-9001',
      originalFilename: 'statement.pdf',
    });
    const result = (await done) as { status: string };
    expect(result.status).toBe('modified');
  });

  it('routes an inconclusive document to manual review', async () => {
    const done = new Promise((resolve) => worker.once('completed', resolve));
    await enqueueDocument({
      pdfUrl: `${TEST}/inconclusive.pdf`,
      entityId: 'loan-app-9002',
      originalFilename: 'payslip.pdf',
    });
    const result = (await done) as { status: string };
    expect(result.status).toBe('inconclusive');
  });

  afterAll(async () => {
    await worker.close();
  });
});

The mock URLs follow the pattern https://api.htpbe.tech/v1/test/{name}.pdf — clean.pdf (intact), modified-high.pdf (modified), signature-removed.pdf (modified), inconclusive.pdf (inconclusive), and more. A live key, by contrast, accepts any public URL and debits one credit per call. Keep the two keys in separate environment files and never commit either.

Step 8: Operating the pipeline

Concurrency and backoff make the pipeline correct at volume. The next layer of work is what makes it operable: how you observe it, how you survive a deploy, what happens when one of your dependencies is the broken one. None of this is BullMQ-specific — the same primitives apply to SQS, Cloud Tasks, or any worker pool. Treat the following as a checklist of operational surface area, not a fixed recipe.

Metrics and alerts

BullMQ emits events for every state transition (completed, failed, stalled, progress). Wire them into your metrics backend so you can see what the queue is actually doing.

import type { Job } from 'bullmq';

worker.on('completed', (job: Job) => incr('htpbe.completed', { status: job.returnvalue?.status }));
worker.on('failed', (job, err) => incr('htpbe.failed', { code: errorCode(err) }));
worker.on('stalled', (jobId) => incr('htpbe.stalled', { jobId }));

// Sample queue depth every 30s — a cheap leading indicator of producer/worker imbalance.
setInterval(async () => {
  const counts = await verifyQueue.getJobCounts('waiting', 'active', 'failed', 'delayed');
  gauge('htpbe.queue_depth', counts.waiting + counts.delayed);
  gauge('htpbe.queue_active', counts.active);
  gauge('htpbe.queue_failed', counts.failed);
}, 30_000);

declare function incr(name: string, tags?: Record<string, unknown>): void;
declare function gauge(name: string, value: number): void;
declare function errorCode(err: Error | undefined): string;

Three alerts cover most real incidents:

Queue depth above a sane ceiling for more than a few minutes — the producer is winning, raise worker concurrency or pause upstream.
Failure rate above ~5% sustained — either the server is degraded or your URLs are unreachable; check the dominant error code.
PAYMENT_REQUIRED event — already covered by the failed listener (pauses the queue and pages billing); make sure the page actually routes to a human.

A dead-letter queue for the unrecoverable

The configuration in Step 1 sets removeOnFail: false, which preserves failed jobs in the failed set so you can inspect them. That is the minimum. The next step up is moving everything that exhausts its retries (and everything thrown as UnrecoverableError) into a separate dead-letter queue — a queue your normal workers do not consume, watched instead by a reviewer or a daily triage process.

import { Queue } from 'bullmq';

export const verifyDLQ = new Queue('pdf-verify-dlq', { connection });

worker.on('failed', async (job, err) => {
  // Same 402 short-circuit as Step 4 — left out here for brevity.

  if (!job) return;
  const attemptsLeft = (job.opts.attempts ?? 1) - (job.attemptsMade ?? 0);
  if (attemptsLeft > 0 && !(err instanceof Error && err.name === 'UnrecoverableError')) {
    return; // BullMQ will retry — not dead-letter material yet
  }

  await verifyDLQ.add('triage', {
    originalJob: job.data,
    failedAt: Date.now(),
    error: { message: err.message, name: err.name, code: (err as HTPBEError).code },
    attemptsMade: job.attemptsMade,
  });
});

A 5-minute job that drains the DLQ into a spreadsheet, a Slack channel, or a ticket in your case-management system is enough for most teams. The point is that no document silently vanishes — every failure ends up somewhere a human can look at it.

Graceful shutdown

A deploy that yanks the process while a worker is mid-analyze either lies to BullMQ (the job is reported as stalled and re-queued, double-charging credits) or interrupts a downstream write half-applied. Listen to SIGTERM, stop accepting new jobs, wait for the in-flight ones, then exit.

async function shutdown(signal: string): Promise<void> {
  console.log(`Received ${signal}, draining worker…`);
  // Stop pulling new jobs immediately; finish the currently active ones.
  await worker.close();
  await verifyQueue.close();
  await connection.quit();
  process.exit(0);
}

process.on('SIGTERM', () => void shutdown('SIGTERM'));
process.on('SIGINT', () => void shutdown('SIGINT'));

worker.close() blocks until active jobs finish. Pair it with a Kubernetes terminationGracePeriodSeconds (or your orchestrator’s equivalent) that comfortably exceeds your worst-case job time. A common starting point is terminationGracePeriodSeconds: 60 against a typical 5-second job — you exit cleanly even when a complex PDF is on the longer end of the distribution.

Fetch timeouts and hung-URL protection

The API call already has its own internal limits, but the network in front of it does not. If the URL you submit points at a slow or hung host, your job sits in active for as long as your HTTP client will wait — potentially forever, on a default Node fetch. Always pin a timeout.

async function fetchWithTimeout(input: string, init: RequestInit, ms: number): Promise<Response> {
  const ctl = new AbortController();
  const t = setTimeout(() => ctl.abort(), ms);
  try {
    return await fetch(input, { ...init, signal: ctl.signal });
  } finally {
    clearTimeout(t);
  }
}

// Use it in analyzeDocument from Step 4:
//   const submit = await fetchWithTimeout(`${BASE}/analyze`, { ... }, 60_000);
//   const fetched = await fetchWithTimeout(`${BASE}/result/${id}`, { headers: auth() }, 30_000);

A 60-second cap on analyze covers the upper end of legitimate analyses with margin; 30 seconds on result is generous for what is normally an immediate read. A timeout fires an AbortError, which BullMQ classifies as transient and retries — exactly the behaviour you want.

Queue size limits

Memory in Redis is not free. If your producer can run unbounded against a stuck worker pool, you can fill Redis and break every other queue sharing the same instance. Cap the queue before it caps you.

async function enqueueWithCap(data: VerifyJobData, cap = 100_000): Promise<void> {
  const { waiting, delayed } = await verifyQueue.getJobCounts('waiting', 'delayed');
  if (waiting + delayed >= cap) {
    throw new Error(
      `verify queue at capacity (${waiting + delayed}/${cap}) — back off the producer`
    );
  }
  await enqueueDocument(data);
}

The cap belongs in the producer, not the worker. When it trips, the right reaction is almost never to enqueue anyway — it is to slow the producer, alert the operator, and let the workers catch up. Treat the cap as load-shedding, not as a queue overflow.

Dashboard

A read-only dashboard makes triage cheap. bull-board plugs into any Express or Koa app and shows queues, jobs, errors, retries, and per-job payloads, with no extra wiring beyond the BullMQ instance you already have. Pin it behind your existing internal auth — never expose it publicly, because the job payloads contain document URLs.

import { createBullBoard } from '@bull-board/api';
import { BullMQAdapter } from '@bull-board/api/bullMQAdapter';
import { ExpressAdapter } from '@bull-board/express';
import { verifyQueue, verifyDLQ } from './queue';

const adapter = new ExpressAdapter();
adapter.setBasePath('/admin/queues');
createBullBoard({
  queues: [new BullMQAdapter(verifyQueue), new BullMQAdapter(verifyDLQ)],
  serverAdapter: adapter,
});

// In your app:
// app.use('/admin/queues', requireInternalAuth, adapter.getRouter());

That is the operational floor: metrics, alerts, a DLQ, graceful shutdown, fetch timeouts, a producer cap, and a dashboard. None of it is HTPBE-specific — it is the same kit you would put around any worker pool wrapping a slow synchronous dependency.

What this architecture does not solve

The queue makes the API call reliable and cheap at volume. It does not change what the analysis itself can and cannot see. Structural PDF forensics detects changes made after a file was created — it cannot catch a document that was fabricated whole-cloth in Word and exported once (structurally intact, because nothing was edited in) or a convincing rebuild of an original in the same software the issuer used. Those are content-truth problems, not structural ones, and they belong to a different layer alongside amount validation, issuer lookups, and sender authentication. For where this fits in a layered defence, see KYC vs. document forensics.

What the queue does buy you is control over the failure modes. When ten thousand documents arrive, you bound the rate hitting the server (no 429 storm), debit credits at most once per document (idempotency on both producer and worker), keep transient errors retrying with backoff (without burning attempts on permanent ones), and surface anything that cannot be recovered — a dry credit pool, a hung URL, a malformed PDF — into a dead-letter queue, an alert, or a reviewer’s queue instead of a silent drop. Failures still happen; the architecture makes them visible and survivable. That is the difference between a script that works on a demo and a pipeline that holds up in production.

To build it against real documents, generate a test key, run the Step 7 tests against the mock URLs, then point a live key at your batch. The API reference has every field and error code, and the pricing page shows how credits and one-time top-up batches map onto the volume you are planning to push.

Batch PDF Verification at Scale: Async Queue Patterns for the HTPBE? API

The contract you are building on top of

Why a naive loop fails

The queue topology

Step 1: The job payload and the queue

Step 2: Idempotency — the part everyone skips

Step 3: The worker — concurrency and the API round-trip

Step 4: The API call — honouring `Retry-After`

Stopping the whole batch on `402`

Custom backoff that reads `Retry-After`

Step 5: Routing the three verdicts

Step 6: Backpressure and the producer

Step 7: Testing the whole pipeline without spending credits

Step 8: Operating the pipeline

Metrics and alerts

A dead-letter queue for the unrecoverable

Graceful shutdown

Fetch timeouts and hung-URL protection

Queue size limits

Dashboard

What this architecture does not solve

Share This Article

Secure your workflow

The contract you are building on top of

Why a naive loop fails

The queue topology

Step 1: The job payload and the queue

Step 2: Idempotency — the part everyone skips

Step 3: The worker — concurrency and the API round-trip

Step 4: The API call — honouring Retry-After

Stopping the whole batch on 402

Custom backoff that reads Retry-After

Step 5: Routing the three verdicts

Step 6: Backpressure and the producer

Step 7: Testing the whole pipeline without spending credits

Step 8: Operating the pipeline

Metrics and alerts

A dead-letter queue for the unrecoverable

Graceful shutdown

Fetch timeouts and hung-URL protection

Queue size limits

Dashboard

What this architecture does not solve

Share This Article

Secure your workflow

Step 4: The API call — honouring `Retry-After`

Stopping the whole batch on `402`

Custom backoff that reads `Retry-After`