PDF Integrity Report: March 2026

Every month we look at aggregate, anonymized data from checks processed through the HTPBE web interface and publish what we find. No file contents, no personally identifiable information — only the structural and metadata signals our algorithm uses to detect modifications.
March 2026: 866 PDFs analyzed through the website, 31 calendar days, and a pace that was more than twice February's.
The Top Line
| Metric | Value |
|---|---|
| Total PDFs analyzed | 866 |
| Flagged as modified | 420 (48.5%) |
| Not flagged | 446 (51.5%) |
| Total data volume | 522.9 MB |
| Total pages analyzed | 4,274 |
Two things stand out immediately: the volume, and the rate.
Volume: 866 checks in March versus 418 in February. That is not a small uptick — it is a doubling. More on what drove that below.
Modification rate: 48.5% is the highest we have seen since launching the public interface. February was 40.4%. The month before that was lower still. Something in what people are submitting in March is different, and the data below gives some hints at what.
Confidence Distribution
| Confidence level | Count | Share |
|---|---|---|
| None (no modification detected) | 446 | 51.5% |
| Certain (definitive structural evidence) | 263 | 30.4% |
| High (strong structural evidence) | 157 | 18.1% |
The "Certain" category — where the evidence is unambiguous, not probabilistic — accounts for nearly a third of all files submitted. These are documents where multiple forensic signals converge: a date mismatch, an incremental update trail, and a tool-signature inconsistency in the same file.
"High" confidence is the more nuanced finding: 157 files showing strong structural evidence of modification without the full convergence of signals. In a compliance workflow, these are not dismissible. A document that looks clean to a human reviewer but carries high-confidence structural anomalies is the definition of a sophisticated edit.
The total — 420 flagged files — means that if you submitted a PDF through HTPBE in March, there was roughly a coin-flip’s chance it came back flagged. That is not a comfortable number for anyone relying on document verification as a workflow step.
What the Algorithm Was Looking For
March was the most active month for detection development since the tool launched. Seventeen algorithm updates shipped between March 8 and March 31, adding entirely new detection layers and fixing gaps that had existed since the earliest versions.
The new techniques detected in the wild:
Template-assembly forgeries (Layer 4.5 / 4.6): Documents built by importing pages from independently generated PDFs into a single container — a forgery pattern used to assemble invoices, certificates, and contracts from components with different origins. The structural fingerprint is distinct: font subset prefixes that do not match across pages, or assembly-tool signatures in the producer field. 36 files carried this signal as their primary finding in March, making it the third most common detection in the month.
Scan-replace page forgeries (Layer 4.8): A document is printed, the target page is physically altered, rescanned, and reinserted alongside the original programmatic pages. The structural evidence is a full-page raster image mixed with text-bearing programmatic content, corroborated by a page-import tool signature or incremental update trail. 18 files matched this pattern.
Anti-forensic strip rasterization (Layer 4.8 strip variant): Pages converted into dozens of narrow horizontal image strips — a technique designed to destroy text extractability while preserving visual appearance in a PDF viewer. No legitimate document workflow produces this structure. Detection is automatic — the pattern is unambiguous enough that no corroboration is required. 5 files carried this pattern in March.
Multi-session document assembly (Layer 4.7): Documents assembled from pages rendered in independent sessions, identified by the same typeface appearing with different font subset prefixes across page groups. 7 files matched.
These four patterns account for 66 files — 15.7% of all flagged documents in March. None of them were detectable before March 8.
Modification Signals: What the Evidence Looks Like
Among the 420 flagged files, these were the primary signals:
| Primary signal(s) | Files | % of modified |
|---|---|---|
| Creation/modification date mismatch | 144 | 34.3% |
| Incremental updates only | 41 | 9.8% |
| Design-tool template assembly | 36 | 8.6% |
| Date mismatch + incremental updates | 32 | 7.6% |
| Known PDF editing tool detected | 22 | 5.2% |
| Scan-replace raster pattern | 18 | 4.3% |
| Date mismatch + template assembly | 16 | 3.8% |
| Mandatory metadata fields removed | 15 | 3.6% |
| Creator/producer present, creation date removed | 13 | 3.1% |
| Soft-mask alpha channel on page images | 11 | 2.6% |
| Date + incremental + XMP/Info disagreement | 11 | 2.6% |
| Multi-session page assembly | 7 | 1.7% |
| Anti-forensic strip rasterization | 5 | 1.2% |
| Text rendered as vector outlines, fonts absent | 4 | 1.0% |
The date mismatch signal — a discrepancy between the embedded creation and modification timestamps — remains the single most common indicator, present in 34.3% of flagged files as the sole signal. Combined with other signals, date mismatch appears in roughly 55% of all flagged files.
The "Mandatory metadata fields removed" finding (15 files) is worth pausing on. A file that has had its creation date or producer field stripped is a file that has been deliberately processed to reduce forensic surface area. Legitimate workflows do not strip metadata from finished documents. When this appears alongside a known editing tool or incremental update trail, it is one of the cleaner indicators of intentional evasion.
Incremental Updates: The Rate Jumped
234 files in March had incremental updates (27.0% of the total). Of those, 196 were also flagged as modified — an 83.8% modification rate among files with incremental updates.
February’s figure was 60.8%. That is a significant shift.
The mechanism has not changed: PDF incremental updates allow appending content after the original write — annotations, revised pages, form data — without rewriting the file. A legitimate chain of three updates might be: original document, digital signature applied, annotation added by reviewer. An illegitimate chain might be: original document, page content replaced, date adjusted.
The jump in modification rate among incremental-update files suggests that the population of such files in March skewed toward the latter. It may also reflect improved detection: several updates shipped in March specifically targeting update-chain analysis, meaning some files that would have been missed in February were caught in March.
The average update chain length was 2.5 revisions, unchanged from February.
Document Origin: A New Layer
March data includes a breakdown by document origin type, made possible by the origin-detection capabilities introduced in late February. This classifies each PDF by the type of tool that produced it.
| Origin type | Count | Share |
|---|---|---|
| Institutional (server-side tools, enterprise systems) | 385 | 44.5% |
| Consumer software ("Cannot Verify") | 195 | 22.5% |
| Legacy (pre-origin-detection records) | 170 | 19.6% |
| Scanned document ("Cannot Verify") | 47 | 5.4% |
| Unknown origin | 38 | 4.4% |
| Online editor ("Cannot Verify") | 31 | 3.6% |
Institutional-origin documents — those produced by server-side systems like wkhtmltopdf, iText, PDFlib, Pdftools SDK — make up the plurality at 44.5%. These are the documents where modification detection is most informative: they should be structurally uniform, and anomalies stand out.
Consumer-software documents (22.5%) receive a "Cannot Verify" result rather than a binary modified/intact verdict. Microsoft Word, LibreOffice, Apple Pages, and similar tools produce files with structural characteristics that overlap significantly with legitimate editing workflows, making false positives unacceptably high. The algorithm is conservative here by design.
Scanned documents (5.4%) and online-editor documents (3.6%) also receive "Cannot Verify" — scans because there is no structural text data to analyze, online editors because their output patterns are indistinguishable from certain editing artifacts.
Digital Signatures: Getting Worse
16 PDFs in March carried embedded digital signatures (1.8% of the total). Of those, 7 had been modified after the signature was applied — a 43.75% post-signature modification rate.
February’s figure was 27.3%.
This matters because digital signatures are widely treated as an integrity guarantee. They are not. A PDF signature covers exactly the bytes it covered when it was applied. Incremental updates appended after signing are not covered by the original signature — and the signature remains technically valid because the bytes it originally signed are still present and unaltered. The new content simply sits outside the signed scope.
Seven out of sixteen signed documents in March had been through this process: new content appended after signing, signature still displaying as valid in any PDF viewer that does not explicitly check the signed range. In workflows where a human looks for the green checkmark and moves on, these documents pass.
The Software Ecosystem
Top producers (the application that last wrote the file):
| Producer | Files | Share |
|---|---|---|
| Microsoft: Print To PDF | 59 | 6.8% |
| PDFium | 52 | 6.0% |
| iLovePDF | 41 | 4.7% |
| iText 2.1.7 by 1T3XT | 32 | 3.7% |
| LibreOffice 5.1 | 32 | 3.7% |
| Microsoft® Word for Microsoft 365 | 28 | 3.2% |
| Canva | 22 | 2.5% |
| PDFlib+PDI 8.0.2p1 | 19 | 2.2% |
| Pdftools SDK | 16 | 1.8% |
| iTextSharp.LGPLv2.Core 3.7.1.0 | 12 | 1.4% |
Top creators (the original authoring application):
| Creator | Files | Share |
|---|---|---|
| PDFium | 55 | 6.4% |
| Microsoft® Word for Microsoft 365 | 29 | 3.4% |
| Microsoft® Word 2016 | 24 | 2.8% |
| Chromium | 24 | 2.8% |
| Canva | 23 | 2.7% |
| VCTransaction | 19 | 2.2% |
| PScript5.dll Version 5.2.2 | 17 | 2.0% |
| Draw | 12 | 1.4% |
| Dropbox Sign | 9 | 1.0% |
| wkhtmltopdf 0.12.6.1 | 8 | 0.9% |
Several patterns worth attention.
iLovePDF nearly quadrupled. February showed 11 files with iLovePDF as producer. March shows 41. The pattern is the same: a document created in Word or Chrome, then processed through an online PDF manipulation service before the creator field gets acknowledged. When iLovePDF appears as producer and Microsoft Word as creator, the document went through an intermediate step that the creator field does not capture. Whether that step was compression, merging, or content editing is what the structural analysis determines.
Canva is now in the top tier. 23 files list Canva as creator, 22 as producer. Canva is a graphic design platform — not the typical authoring environment for a business document. Its presence in both fields means Canva-originated documents are being submitted for verification as contracts, certificates, or financial records. Design tools are powerful enough to produce convincing documents; they are also trivially easy to edit after the fact.
iText and PDFlib+PDI in the producer field. Together, iText 2.1.7 (32 files) and PDFlib+PDI (19 files) account for 51 files that were processed through document manipulation libraries after their original creation. These libraries are used legitimately for merging, watermarking, and signing — but they are also the same tools used for page-import and template-assembly forgeries. Context determines which it is; that is what the structural analysis resolves.
Dropbox Sign appeared as creator in 9 files. These are e-signature contracts being submitted for verification by counterparties — the same use case that drove Upwork’s presence in February. Recipients checking documents generated by third-party platforms before acting on them.
PDF Version Landscape
| PDF Version | Files | Share |
|---|---|---|
| 1.7 | 340 | 39.3% |
| 1.4 | 262 | 30.3% |
| 1.5 | 100 | 11.5% |
| 1.6 | 85 | 9.8% |
| 1.3 | 70 | 8.1% |
| 1.2 | 7 | 0.8% |
| Other / missing | 2 | 0.2% |
PDF 1.7 and 1.4 together account for 69.6% of the sample — the same two-version dominance as February. PDF 2.0 did not appear in a single file this month, despite being available for nearly a decade.
Notable: no files with invalid or unparseable version strings this month, compared to 7 in February. The improved PDF 1.5+ parsing shipped in March likely accounts for part of this.
JavaScript: A First
Two files in March contained embedded JavaScript. February had zero.
This is a small number, but it is the first time JavaScript has appeared in the monthly sample. PDF JavaScript is used for interactive forms and scripting, but it is also a vector for malicious behavior — drive-by execution, data exfiltration, sandbox escapes in older viewers. Files with embedded JavaScript in business documents (contracts, invoices, bank statements) warrant extra scrutiny regardless of modification status.
Document Profile
The average PDF checked in March:
- Average size: 0.60 MB
- Largest file: 9.71 MB (vs 9.70 MB in February — essentially the same ceiling)
- Average page count: 4 pages
- Total pages analyzed: 4,274
Metadata completeness averaged 78 out of 100, up slightly from 76 in February. The score measures how many of the eight standard PDF metadata fields are populated.
Missing creation dates affected 141 files (16.3%) — up from 12.7% in February. A document without a creation date has lost one of the cleaner forensic anchors. The March algorithm updates improved detection for files in this state; the "Creator or producer present, creation date removed" signal (13 files) is one result of that improvement.
Daily Volume
Usage in March accelerated week-over-week, with the final week carrying the heaviest load:
Mar 01: 9 Mar 09: 40 Mar 17: 9 Mar 25: 21
Mar 02: 40 Mar 10: 15 Mar 18: 23 Mar 26: 57
Mar 03: 11 Mar 11: 33 Mar 19: 44 Mar 27: 40
Mar 04: 27 Mar 12: 55 Mar 20: 28 Mar 28: 22
Mar 05: 42 Mar 13: 27 Mar 21: 5 Mar 29: 31
Mar 06: 23 Mar 14: 15 Mar 22: 5 Mar 30: 29
Mar 07: 12 Mar 15: 11 Mar 23: 49 Mar 31: 55
Mar 08: 22 Mar 16: 24 Mar 24: 42
The peak day was March 26 with 57 checks. Three days crossed 50: March 12, 26, and 31. The overall daily average was 27.9 checks, up from February’s 14.9 (though February had only 28 days).
The acceleration is visible: the first week averaged 20.6 checks per day; the last week averaged 37.2. No single event explains it — the growth was gradual, which suggests organic expansion rather than a spike from one source.
Other Signals
Embedded files: 3 (less than 1%). PDFs can contain binary attachments — another compliance risk vector in document workflows.
Suspicious tool patterns: Zero files flagged for creator–producer inconsistency. This is partly a detection refinement: the March updates narrowed several signals that had previously fired on legitimate tool combinations (Chrome printing via Skia/PDF, institutional PDFium usage), reducing noise in this category.
Summary
March 2026 by the numbers:
- 866 PDFs analyzed — more than double February — reflecting growing adoption of document verification workflows
- 48.5% modification rate, the highest recorded month, up from 40.4% in February
- New detection signals found in the wild: template assembly (36 files), scan-replace rasterization (18 files), multi-session assembly (7 files), anti-forensic strip rasterization (5 files) — patterns that were undetectable before March 8
- Incremental update modification rate jumped to 83.8%, from 60.8% in February — files with update chains are now overwhelmingly associated with tampering
- Digital signatures less protective than ever: 43.75% of signed documents had been modified after signing, up from 27.3%
- Canva entered both creator and producer charts — design-tool documents are now a meaningful share of the verification queue
- iLovePDF nearly quadrupled as producer (11 → 41), indicating more documents being processed through online manipulation pipelines before submission
- Two files with JavaScript — the first time embedded scripting has appeared in the monthly sample
- PDF 2.0 adoption: still zero
The 48.5% rate is a milestone, but the more significant number is what drove it: the new detection layers that shipped in March caught patterns that were previously invisible. Those 66 files — template assemblies, scan replacements, strip-rasterized pages — are not newly created frauds. They existed in February too. We just could not see them.
Data covers all checks submitted through the HTPBE web interface in March 2026 (UTC). File contents are not stored or analyzed; only structural metadata signals are retained. All figures are aggregate and anonymized.