PDF Fraud & Tampering Statistics
Real-world tamper rates and modification patterns from 15,219 PDFs analyzed by HTPBE? users — refreshed automatically every six hours from production traffic.
Reading these numbers
What this dataset is — and what it is not
Every number on this page comes from real production traffic. Documents arrive via the public API, get analyzed by the same engine described in how it works, and aggregate into this view. No synthetic samples, no benchmark corpus.
The tamper rate reflects only PDFs where the engine could reach a confident verdict — consumer-grade origins that return inconclusive are excluded from the percentage so it does not mix “unknown” with “modified.”
Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). They differ even in legitimate files — Word + Print to PDF is the common case.
Scale of the dataset
What the engine has read so far
Volume processed since aggregation began. Each unit corresponds to bytes the engine actually parsed end-to-end, not requests received.
Security findings
What stood out as risk in production
signed & modified145
PDFs that carried a digital signature and were modified after it was applied. The only fraud pattern HTPBE? returns with certain confidence.
embedded javascript93(0.6%)
Files containing embedded JavaScript. Legitimate in interactive forms, suspicious in financial documents and credentials — we surface it for manual review.
longest update chain24revisions
Maximum number of incremental updates observed in a single PDF. Each revision is a save event after the original; deep chains usually mean iterative editing, not legitimate workflow.
Tool fingerprints
Which software shows up most often
Producer is the engine that wrote the final PDF. Creator is the app that produced the original document. Click any known tool to see its fraud profile in detail.
Top Producers
App that converted or last saved the PDF
Top Creators
App that created the original document before PDF conversion
Seasonality
When tampering peaks during the year
Modification rate by document modification month. Month-to-month variance is small — fraud is not seasonal in any strong sense, but tax and fiscal-year cycles do show up.
What the calendar says
Across the year, tamper rate sits around 58.7% on average, with 68.3 pp between the lowest and highest months. That spread is small enough that month alone is not a useful fraud signal — volume and document type matter far more.
Share of documents verified this year
Long tail
Distributions, anomalies, and extremes
Smaller cuts of the same dataset — useful for understanding what “normal” looks like before treating an outlier as a signal.
PDF versions in the wild
Structural anomalies
- Without creation date2,705
- With embedded files105
- With incremental updates2,237
Extremes
- Largest document248 pages
- Largest file10.28 MB
- Total PDF objects1,811,996
- Oldest analyzed PDF created24 years ago
For journalists & researchers
Cite this dataset
These are anonymized aggregate figures — counts and rates only, never per-document or personal data — refreshed automatically every six hours from production traffic. Free to cite and reuse under CC BY 4.0 with attribution to HTPBE?.
Suggested citation
HTPBE (2026). PDF Tampering & Modification Statistics. Aggregate dataset of 15,219 PDFs analyzed in production. Retrieved from https://htpbe.tech/statistics (updated 13 Jun 2026, 01:59 UTC).
Permalink: https://htpbe.tech/statistics · Last updated 13 Jun 2026, 01:59 UTC
Quotable findings
- ▸47.4% of conclusively-analyzed PDFs showed structural modification markers.
- ▸15,219 PDFs analyzed at the binary level in production.
- ▸145 digitally-signed PDFs were modified after signing — the highest-confidence fraud pattern.
Background on the method: how HTPBE? detects tampered documents, and the patterns by document type.
Secure your workflow
Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.