AI-rendered PDF fraud

AI-Generated Document Detection — Catch Generator-Tool PDFs

Q: Can htpbe? detect every AI-generated PDF?

No — and we want to be honest about that. We catch AI-rendered PDFs that retain their toolchain producer fingerprints (headless browsers, PDF libraries, AI-platform tooling). We catch AI-generated content pasted into a real PDF (incremental update trail, MODIFIED verdict). We do NOT catch: AI-generated PDFs that have been printed, scanned, and re-saved through a real institutional workflow; sophisticated AI tools that successfully spoof institutional producer strings; AI-generated text that a human pasted into Word and exported (looks identical to human-typed Word). For complete coverage, pair htpbe? with content-classifier tooling and human review for high-stakes decisions.

AI-generated receipts, payslips, and bank statements are passing visual review — your existing tools only check the text, not how the file was made. Fraud-ops, AP, claims, and HR teams started seeing AI-generated PDFs at scale in 2024. The documents look right. The text passes OCR. Content classifiers are inconsistent. What's missing is a check on the file's structural layer: where did this PDF actually come from? AI tools render through standard toolchains (Chrome Headless, Puppeteer, wkhtmltopdf, ReportLab) that leave recognisable producer fingerprints — fingerprints that institutional billing systems and payroll engines never produce. We don't classify AI-written text — we read the rendering toolchain fingerprint the institutional source would have left. Read the honest scope below.

~3 sec

per document

35 checks

forensic layers

From $15

per month

1,500+

docs / month on Growth

Scope

htpbe? analyzes the structural layer of the PDF file — the producer/creator metadata, the xref chain, the digital signature state, font subsets, image streams. We do NOT run an AI content classifier on the text inside the PDF. We do NOT decide whether words were 'written by AI'. What we DO is detect when a PDF lacks the institutional-issuer fingerprint real documents carry — which catches the high-volume, technically unsophisticated AI-rendered PDFs that pass visual review today.

We will NOT catch: an AI-generated PDF that has been printed, scanned, and re-saved through a real institutional workflow (the AI fingerprints are gone); a sophisticated AI tool that successfully spoofs a real institutional producer string in metadata; AI-generated text that a human pasted into Word and exported (we cannot tell that text is AI-written from file structure alone). For those scenarios, defence-in-depth means pairing htpbe? with content classifiers and manual review.

How it looks

One REST call, one deterministic verdict

Upload the PDF. The API returns INTACT, MODIFIED, or INCONCLUSIVE with named markers — in about three seconds.

What this looks like

How AI-rendered PDFs typically look at the file layer

Three real fraud mechanics we catch at the structural PDF layer.

AI tool renders a 'receipt' or 'invoice' through a headless browser

A user prompts an AI tool to 'generate a hotel folio for Marriott Times Square'. The tool outputs an HTML render and exports through Chrome Headless or Puppeteer. The producer field is the headless browser; there is no Marriott PMS metadata in the file. Real Marriott folios always carry the PMS producer signature.

AI tool exports through a PDF library

AI assistants use libraries like wkhtmltopdf, ReportLab, jsPDF, or PDFKit to produce PDFs. These leave recognisable producer strings — distinct from any payroll, banking, or government issuer. Single-session, no incremental update, no institutional metadata.

AI-generated text pasted into Word and exported

Honest answer: we typically cannot distinguish this from a human-typed Word document. Both produce a Microsoft Word producer signature, single-session export. The verdict is INCONCLUSIVE — same as any Word-authored document. Whether INCONCLUSIVE is a fraud signal depends on document context (a Word 'W-2' is suspicious; a Word reference letter from a small employer might be legitimate).

The scale

2024+

AI-generated documents now appear at scale across expense, claims, and lending

~3 sec

per document via API

No model

we don't run a content classifier — we read producer metadata

Why your existing checks miss this

Content classifiers see the text. They don't see how the PDF was rendered.

And content classifiers tuned for AI text don't transfer cleanly to PDFs.

OCR and rule-based document platforms extract data — they cannot tell whether the underlying PDF was issued by a real merchant or rendered by an AI tool. AI text classifiers (GPTZero and similar) are inconsistent on PDF documents because the structural layer carries different signals than free-text. htpbe? inspects the file structure — producer, metadata, xref, image streams — and reports what it sees. Pair us with a content classifier for full coverage: classifiers handle the language layer, we handle the file layer.

Results in under 3 seconds·30 to 1,500+ documents/month·From $15/mo

How it works

Five forensic layers, one deterministic verdict

Every PDF we receive passes through the same structural pipeline — no model training, no thresholds to tune.

Metadata analysis

Creation and modification timestamps, producer and creator fields, XMP metadata — the first layer exposes basic tampering.

File structure

Xref tables, trailer chain, incremental updates. Any edit after export leaves a structural fingerprint here.

Digital signatures

Signature chain integrity and post-signature modifications produce deterministic markers. Certainty-level signal.

Content integrity

Fonts, objects, embedded content, page assembly. Multi-session edits and inserted objects are visible at this layer.

Verdict with markers

Deterministic output: INTACT / MODIFIED / INCONCLUSIVE, with named markers for every finding — suitable for audit trail.

See the full 35-check breakdown

Document types

AI-rendered PDFs we typically flag (via producer/metadata)

Every type listed below is analyzed at the structural file layer — not the rendered image.

AI-rendered receipt PDF (hotel, restaurant, ride)AI-rendered invoice PDFAI-rendered employer letter PDFAI-rendered template-based bank statement PDFAI-rendered medical bill PDFAI-rendered insurance claim PDFAI-edited region inserted into a real PDF (caught as MODIFIED via xref delta)

What htpbe? checks

Detection capabilities

Deterministic structural signals. No probabilistic scores, no model training.

Producer signature reveals the rendering toolchain

AI-generated PDFs typically render through a headless browser (Chrome Headless, Puppeteer, Playwright) or a PDF library (wkhtmltopdf, ReportLab, PDFKit, jsPDF). These leave producer strings that are distinct from authentic issuer producers (payroll engines, EHR billing, banking portals, government systems). We surface the producer field; you interpret it against the document type.

INCONCLUSIVE verdict is the typical signal

AI-rendered PDFs almost never trigger MODIFIED — there is no edit trail to confirm. They trigger INCONCLUSIVE: 'this PDF does not have institutional-issuer fingerprints'. In context (a 'receipt' that should come from a real POS, a 'bank statement' that should come from a banking system), INCONCLUSIVE is a strong fraud-positive signal.

Single-session creation pattern

AI-generated PDFs are produced in one shot — CreationDate equals ModDate, single xref table, no incremental update history. Real institutional production systems often carry richer history with incremental updates.

AI-edited regions in real PDFs trigger MODIFIED

When an AI-generated region is pasted into a real PDF, the file shows an incremental update trail and the verdict becomes MODIFIED. The detection works regardless of whether the inserted content was AI-made or human-typed — we detect the EDIT, not the AI origin.

Image-stream artefacts in pasted AI logos and headers

AI tools that paste merchant logos or letterhead images leave compression artefacts that differ from authentic embedded headers. Image-stream metadata exposes the difference where the AI tool reused stock images.

Font subset divergence across pages

Multi-page AI-rendered documents often show font subset prefix shifts between pages — a fingerprint of multi-call generation rather than single-session institutional export.

Integrate in minutes

Two HTTP calls — and you read the producer field yourself

Buyers can skip this section — developers, the integration is two HTTP calls.

Step 1 — submit the PDF

curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer $HTPBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-storage/suspicious-receipt.pdf"}'

Step 2 — read the verdict and the producer field

{
  "id": "a1i2g3e4-5n6e7r-8a9t-0e0d-z1z2z3z4z5z6",
  "status": "inconclusive",
  "modification_confidence": "none",
  "modification_markers": [
    "Headless-browser producer detected (Puppeteer)",
    "Single-session creation — no institutional metadata",
    "No incremental update trail"
  ],
  "producer": "Puppeteer (Chrome 124.0)",
  "creator": "Puppeteer (Chrome 124.0)",
  "creation_date": 1707350400,
  "modification_date": 1707350400,
  "has_digital_signature": false,
  "xref_count": 1,
  "has_incremental_updates": false
}

htpbe? returns inconclusive — there is no edit trail (so MODIFIED isn't justified), and the file lacks institutional metadata (so INTACT isn't either). The producer field shows Puppeteer — a headless-browser rendering toolchain commonly used by AI tools. For a receipt that should have come from a hotel PMS or POS system, this combination is a strong fraud signal. Your application reads the producer field and applies the rule: 'receipts from this issuer must have producer X; everything else is suspect'. We don't make that judgement for you — we surface the data.

Get your API key — free test keys included Full API documentation

Customer Stories

Teams that stopped document fraud

Compliance, finance, and risk teams use htpbe? to catch manipulated PDFs before they become costly mistakes.

Caught an invoice where the total had been changed by less than a thousand dollars. Without this I would have approved it without a second look.

Sarah M.

AP Manager

United States

We had three applicants in the same week with bank statements that looked completely fine. Two of them were flagged as modified. You simply cannot see this by reading the document — it is in the file structure.

Lars V.

Risk Analyst, Online Lending

Netherlands

Salary slips were coming with altered figures. We identified two problematic files before the placement was finalised.

Priya K.

HR Operations Lead

India

Since we started checking documents this way, we stopped two applications early in the process that would have been very difficult to reverse later.

Julien R.

Fraud Analyst, Fintech

France

Some applicants were sending PDFs that looked authentic but had been edited in ways not visible to the eye. We now ask for verified originals when something is flagged. Already saved us from a few bad decisions.

Marta S.

Compliance Coordinator

Spain

One invoice was caught because there was a mismatch between the document dates and structure. That particular case would have cost us significantly.

Tariq A.

Finance Manager

United Arab Emirates

FAQ

Frequently asked questions

No — and we want to be honest about that. We catch AI-rendered PDFs that retain their toolchain producer fingerprints (headless browsers, PDF libraries, AI-platform tooling). We catch AI-generated content pasted into a real PDF (incremental update trail, MODIFIED verdict). We do NOT catch: AI-generated PDFs that have been printed, scanned, and re-saved through a real institutional workflow; sophisticated AI tools that successfully spoof institutional producer strings; AI-generated text that a human pasted into Word and exported (looks identical to human-typed Word). For complete coverage, pair htpbe? with content-classifier tooling and human review for high-stakes decisions.

Honest answer: we flag non-institutional PDFs. We don't run a content classifier that decides whether text was written by AI. We read the producer field and surface the structural pattern. Most AI-generated PDFs land at INCONCLUSIVE in our verdict because they lack institutional metadata and have headless-browser or PDF-library producer signatures. In the right document context (a receipt that should come from a real POS, a 'W-2' that should come from a payroll engine), INCONCLUSIVE is a strong fraud-positive signal — but it's the buyer who applies that contextual interpretation, not us.

No. AI text classifiers analyse prose to estimate the probability that text was written by AI. htpbe? inspects the PDF file structure — producer, metadata, creation patterns. The two are complementary: classifiers handle the language layer; we handle the file layer. Together they cover more of the AI-generated document surface than either alone.

Sophisticated tools that spoof producer strings exist. If the spoof is technically perfect AND the AI-generated PDF passes through to your file, neither htpbe? nor most other tools will catch it from the file alone — at that point verification has to happen at the source (calling the bank or employer). htpbe? catches the high-volume, technically unsophisticated AI fraud that passes visual review today. Sophisticated targeted attacks need defence-in-depth.

Secure your workflow

Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.

Secure your workflow View API docs