PDF Integrity Report: February 2026
Every month we look at aggregate, anonymized data from checks processed through the HTPBE web interface and publish what we find. No file contents, no personally identifiable information — only the structural and metadata signals our algorithm uses to detect modifications.
February 2026: 418 PDFs analyzed through the website, 28 calendar days, steady daily volume.
The Top Line
| Metric | Value |
|---|---|
| Total PDFs analyzed | 418 |
| Flagged as modified | 169 (40.4%) |
| Clean | 249 (59.6%) |
| Average risk score | 27 / 100 |
| Total data volume | 210.3 MB |
| Total pages analyzed | 1,902 |
Two in five PDFs submitted through the website in February showed signs of post-creation modification. That is a higher rate than cross-industry averages suggest — but it reflects the selection bias of verification workflows: people check documents when they have a reason to be concerned.
Risk Score Distribution
Risk scores run 0–100, combining all detected signals into a single weighted metric.
| Risk band | Count | Share |
|---|---|---|
| 0–20 (clean) | 231 | 55.3% |
| 21–40 (low suspicion) | 22 | 5.3% |
| 41–60 (medium) | 93 | 22.2% |
| 61–80 (high) | 16 | 3.8% |
| 81–100 (critical) | 55 | 13.2% |
The average risk score of 27 is pulled upward by the tail: 55 files (13.2%) scored in the 81–100 critical band. These documents carry stacked forensic signals — a date mismatch combined with incremental update artifacts combined with tool-signature inconsistencies. For files in this range, the algorithm assigns 100% modification confidence.
The medium band (41–60) deserves attention: 93 files, 22.2% of the total. These documents show something anomalous — a suspicious field, a questionable timestamp — but no single finding is unambiguous enough for a definitive verdict. In a compliance workflow, these warrant manual review.
Modification confidence breakdown:
| Confidence | Count | Share of total |
|---|---|---|
| None (clean) | 211 | 50.5% |
| 100% (definitively modified) | 145 | 34.7% |
| High (strong evidence) | 24 | 5.7% |
More than a third of all uploaded PDFs carried 100% modification confidence — meaning the evidence was unambiguous, not probabilistic.
How Modifications Are Detected
Among the 169 flagged files, the algorithm identified the following signals:
| Detection signal(s) | Files | % of modified |
|---|---|---|
| Modification date differs (only) | 58 | 34.3% |
| Incremental updates + modification date differs | 31 | 18.3% |
| Incremental updates (only) | 15 | 8.9% |
| Incremental updates + suspicious update pattern | 15 | 8.9% |
| No explicit signal (rule-based verdict) | 15 | 8.9% |
| All three: incremental + suspicious + date | 8 | 4.7% |
| Invalid date sequence + anomalies + date differs | 6 | 3.6% |
| Tool signature mismatch combinations | 7 | 4.1% |
The single most common detection signal — appearing in 62% of flagged files — is a discrepancy between the embedded creation and modification timestamps. A document edited in an external tool will often have its modification date updated while the original creation date remains as set by the authoring software. This divergence, when combined with other signals, becomes a strong forensic indicator.
Incremental updates were detected in 97 files (23.2% of all February checks). This is the PDF mechanism that allows appending content — annotations, form data, revised pages — without rewriting the file. Among those 97 files, the average update chain length was 2.6 revisions. Crucially, 59 of those 97 files (60.8%) were also classified as modified. The remaining 40% showed incremental updates consistent with legitimate workflows: annotations, digital signatures, or form completion.
Critical modification markers across all flagged files:
- Different creation and modification dates — 113 files
- Multiple cross-reference tables (incremental updates) — 40 files
- Known PDF editing tool detected — 15 files
The Software Ecosystem
PDF metadata reveals which software created and last touched a document. February showed a clearly Microsoft-centric picture, with significant freelance-platform presence.
Top producers (the application that last wrote the file):
| Producer | Files | Share |
|---|---|---|
| Microsoft: Print To PDF | 24 | 5.7% |
| PDFium | 20 | 4.8% |
| mPDF 8.2.5 | 18 | 4.3% |
| Upwork | 16 | 3.8% |
| Microsoft® Word for Microsoft 365 | 12 | 2.9% |
| iLovePDF | 11 | 2.6% |
| Style Report | 11 | 2.6% |
| OpenPDF 1.3.26 | 11 | 2.6% |
| PDFsharp 1.50 | 10 | 2.4% |
Top creators (the original authoring application):
| Creator | Files | Share |
|---|---|---|
| PDFium | 21 | 5.0% |
| Upwork | 16 | 3.8% |
| Microsoft® Word 2016 | 14 | 3.3% |
| Microsoft® Word for Microsoft 365 | 13 | 3.1% |
| Style Report | 11 | 2.6% |
| PDFsharp 1.50 | 10 | 2.4% |
| PScript5.dll Version 5.2.2 | 9 | 2.2% |
| Chromium | 9 | 2.2% |
| Microsoft Word | 8 | 1.9% |
| Microsoft® Word 2019 | 6 | 1.4% |
Several patterns worth noting.
Microsoft Word fragments into multiple entries. Word 2016, Word 2019, Word for Microsoft 365, and the generic "Microsoft Word" string together account for 41 files — the single largest authoring platform if consolidated. Organizations upgrading their Office installations leave version-heterogeneous document archives, and all of those versions end up in verification queues.
iLovePDF in the producer field signals documents that were processed through an online PDF manipulation service after their original creation. When a file lists iLovePDF as producer but names Microsoft Word or Chromium as creator, the document went through an intermediate editing step that the creator field does not acknowledge. Eleven files carried this pattern in February.
Upwork appears in both creator and producer (16 files each). The Upwork platform generates its own PDFs — contracts, payment statements, work history reports — and they are being submitted for authenticity verification by counterparties before acting on them. This reflects a real-world use case: recipients checking freelance platform documents before releasing funds or signing agreements.
mPDF 8.2.5 (18 files as producer) is a PHP PDF library used by web applications to generate invoices, receipts, and reports programmatically. These are application-generated documents, not user-authored files — which makes any structural inconsistency more notable, since they should be templated and uniform.
PDFium appearing in both creator and producer (20 and 21 files respectively) reflects Chrome-based PDF generation — printouts from web applications, saved browser pages, Google Docs exports.
PDF Version Landscape
| PDF Version | Files | Share |
|---|---|---|
| 1.7 | 154 | 36.8% |
| 1.4 | 113 | 27.0% |
| 1.5 | 66 | 15.8% |
| 1.6 | 35 | 8.4% |
| 1.3 | 36 | 8.6% |
| 2.0 | 3 | 0.7% |
| 1.2 | 3 | 0.7% |
| Invalid/missing | 7 | 1.7% |
PDF 1.7 leads at 36.8%, with 1.4 a strong second at 27%. Together they account for nearly two thirds of the sample. PDF 2.0 — the ISO 32000-2 standard from 2017 — appears in just 3 files (0.7%), reflecting how slowly the ecosystem adopts new specifications.
Seven files had an invalid or unparseable version string. A well-formed PDF should always declare its version in the file header; losing this field is a sign of either corruption or aggressive editing that stripped the header.
Digital Signatures: Present but Not Protective
11 PDFs carried embedded digital signatures (2.6% of the total). Of those, 3 had been modified after the signature was applied — a 27.3% post-signature modification rate among signed documents.
The mechanism most commonly exploited here is incremental updates. The PDF specification permits content to be appended after a signature is applied, provided the additions are limited to explicitly permitted operations. Some editors exploit the ambiguity of what constitutes a "permitted" change to introduce substantive content modifications — revised figures, changed dates, altered party names — while preserving a signature that remains cryptographically valid within its original scope.
The result: a document that displays a valid signature indicator in a viewer, but whose content has changed since signing. The signature covers what it covered when it was applied; it does not cover what was added afterward.
In practice, most organizations treat the presence of a signature field as sufficient verification. Active signature validation — which would surface these post-signature modifications — is rarely performed outside of legal and financial workflows with formal verification requirements.
Document Profile
The average PDF checked through the website in February:
- Average size: 0.50 MB
- Largest file: 9.70 MB
- Average page count: 4 pages
- Total pages analyzed: 1,902
The half-megabyte average is consistent with the document types typically submitted for verification: invoices, contracts, bank statements, certificates. Short documents with specific numerical or legal content — where a changed figure or date has real financial or legal consequence.
Metadata completeness averaged 76 out of 100. The score measures how many of the eight standard PDF metadata fields (title, author, creator, producer, creation date, modification date, subject, keywords) are populated. Missing creation dates affected 53 files (12.7%) — removing one of the cleaner forensic signals and increasing reliance on structural analysis.
Daily Volume
Usage was steady throughout February, without dramatic spikes:
Feb 06: 28 Feb 14: 11 Feb 22: 1
Feb 07: 10 Feb 15: 20 Feb 23: 24
Feb 08: 1 Feb 16: 25 Feb 24: 12
Feb 09: 11 Feb 17: 28 Feb 25: 47
Feb 10: 25 Feb 18: 11 Feb 26: 20
Feb 11: 16 Feb 19: 20 Feb 27: 24
Feb 12: 32 Feb 20: 14 Feb 28: 4
Feb 13: 20 Feb 21: 14
The peak day was February 25 with 47 checks — roughly 1.7× the monthly daily average of 27.5. No batch processing, no anomalous spikes. The distribution reflects organic usage: higher on weekdays, quieter on weekends, with the first week of the month running slightly lighter than the rest.
Other Signals
JavaScript in PDFs: zero across all 418 files. No embedded JavaScript was detected in February. This is consistent with the document types: invoices, contracts, and certificates do not use interactive scripting.
Embedded files: 4 (less than 1%). PDFs can contain binary attachments. Four documents carried embedded content. Not unusual, but worth flagging in any workflow where file attachments introduce compliance risk.
Suspicious tool patterns: 50 files (12.0%). This flag indicates that the creator–producer metadata combination is internally inconsistent in ways that suggest an unacknowledged intermediate processing step. The file claims a creation toolchain that does not match its structural fingerprint.
Summary
February 2026 by the numbers:
- 40.4% of submitted PDFs showed modification signals — significantly above the commonly cited 25–30% industry baseline, consistent with the self-selection of verification workflows
- Modification date discrepancy is the leading forensic indicator, present in 62% of flagged files
- Microsoft Office ecosystem (Word across multiple versions, Print to PDF) is the primary authoring environment in this sample
- iLovePDF and online editors leave traceable producer-field evidence in files that subsequently pass through verification
- Upwork documents are a recurring verification target — freelance contracts and payment records being checked by counterparties
- Digital signatures do not guarantee post-signature integrity — 27% of signed files in this sample were modified after signing
- PDF 2.0 adoption remains below 1% despite being available for nearly a decade
The 40.4% modification rate is the most important number from February. It means that when someone uploads a PDF to check its authenticity, there is better than a one-in-three chance the document will come back flagged. That is not a marginal outcome — it is why verification workflows exist.
Data covers all checks submitted through the HTPBE web interface in February 2026 (UTC). File contents are not stored or analyzed; only structural metadata signals are retained. All figures are aggregate and anonymized.