PDF Fraud & Tampering Statistics
Real-world tamper rates and modification patterns from 14,418 PDFs analyzed by HTPBE? users — refreshed automatically every six hours from production traffic.
Reading these numbers
What this dataset is — and what it is not
Every number on this page comes from real production traffic. Documents arrive via the public API, get analyzed by the same engine described in how it works, and aggregate into this view. No synthetic samples, no benchmark corpus.
The tamper rate reflects only PDFs where the engine could reach a confident verdict — consumer-grade origins that return inconclusive are excluded from the percentage so it does not mix “unknown” with “modified.”
Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). They differ even in legitimate files — Word + Print to PDF is the common case.
Scale of the dataset
What the engine has read so far
Volume processed since aggregation began. Each unit corresponds to bytes the engine actually parsed end-to-end, not requests received.
Security findings
What stood out as risk in production
signed & modified144
PDFs that carried a digital signature and were modified after it was applied. The only fraud pattern HTPBE? returns with certain confidence.
embedded javascript92(0.6%)
Files containing embedded JavaScript. Legitimate in interactive forms, suspicious in financial documents and credentials — we surface it for manual review.
longest update chain9revisions
Maximum number of incremental updates observed in a single PDF. Each revision is a save event after the original; deep chains usually mean iterative editing, not legitimate workflow.
Tool fingerprints
Which software shows up most often
Producer is the engine that wrote the final PDF. Creator is the app that produced the original document. Click any known tool to see its fraud profile in detail.
Seasonality
When tampering peaks during the year
Modification rate by document modification month. Month-to-month variance is small — fraud is not seasonal in any strong sense, but tax and fiscal-year cycles do show up.
What the calendar says
Across the year, tamper rate sits around 61.0% on average, with 71.8 pp between the lowest and highest months. That spread is small enough that month alone is not a useful fraud signal — volume and document type matter far more.
Share of documents verified this year
Long tail
Distributions, anomalies, and extremes
Smaller cuts of the same dataset — useful for understanding what “normal” looks like before treating an outlier as a signal.
PDF versions in the wild
Structural anomalies
- Without creation date2,524
- With embedded files105
- With incremental updates2,162
Extremes
- Largest document145 pages
- Largest file10.28 MB
- Total PDF objects1,728,197
- Oldest analyzed PDF created24 years ago
Secure your workflow
Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.