Updated 17 May 2026, 04:20 UTC

PDF Fraud & Tampering Statistics

Real-world tamper rates and modification patterns from 14,418 PDFs analyzed by HTPBE? users — refreshed automatically every six hours from production traffic.

Reading these numbers

What this dataset is — and what it is not

Every number on this page comes from real production traffic. Documents arrive via the public API, get analyzed by the same engine described in how it works, and aggregate into this view. No synthetic samples, no benchmark corpus.

The tamper rate reflects only PDFs where the engine could reach a confident verdict — consumer-grade origins that return inconclusive are excluded from the percentage so it does not mix “unknown” with “modified.”

Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). They differ even in legitimate files — Word + Print to PDF is the common case.

14,418
PDFs analyzed
47.1%
Of conclusive analyses were modified
5.1%
Carried a digital signature
24 years
Oldest creation date among analyzed PDFs

Scale of the dataset

What the engine has read so far

Volume processed since aggregation began. Each unit corresponds to bytes the engine actually parsed end-to-end, not requests received.

56,216
Pages analyzed across all documents
1,728,197
PDF objects parsed at the binary level
6.66 GB
Cumulative size of all submitted PDFs

Security findings

What stood out as risk in production

signed & modified

144

PDFs that carried a digital signature and were modified after it was applied. The only fraud pattern HTPBE? returns with certain confidence.

embedded javascript

92(0.6%)

Files containing embedded JavaScript. Legitimate in interactive forms, suspicious in financial documents and credentials — we surface it for manual review.

longest update chain

9revisions

Maximum number of incremental updates observed in a single PDF. Each revision is a save event after the original; deep chains usually mean iterative editing, not legitimate workflow.

Tool fingerprints

Which software shows up most often

Producer is the engine that wrote the final PDF. Creator is the app that produced the original document. Click any known tool to see its fraud profile in detail.

P

Top Producers

App that converted or last saved the PDF

C

Top Creators

App that created the original document before PDF conversion

Seasonality

When tampering peaks during the year

Modification rate by document modification month. Month-to-month variance is small — fraud is not seasonal in any strong sense, but tax and fiscal-year cycles do show up.

January80.3%
February71.1%
March30.8%
April64.1%
May74.6%
June62.2%
July85.7%
August81.1%
September13.9%
October38.8%
November66.4%
December63.5%

What the calendar says

Across the year, tamper rate sits around 61.0% on average, with 71.8 pp between the lowest and highest months. That spread is small enough that month alone is not a useful fraud signal — volume and document type matter far more.

Peak month
July 85.7%
Lowest month
September 13.9%
Highest volume
April 51.6%

Share of documents verified this year

Long tail

Distributions, anomalies, and extremes

Smaller cuts of the same dataset — useful for understanding what “normal” looks like before treating an outlier as a signal.

PDF versions in the wild

PDF 1.738.0%
PDF 1.429.9%
PDF 1.513.6%
PDF 1.39.9%
PDF 1.66.7%
PDF 1.21.0%
PDF 1.11.0%
PDF 2.00.0%
PDF unknown0.0%

Structural anomalies

  • Without creation date2,524
  • With embedded files105
  • With incremental updates2,162

Extremes

  • Largest document145 pages
  • Largest file10.28 MB
  • Total PDF objects1,728,197
  • Oldest analyzed PDF created24 years ago

Secure your workflow

Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.