PDF Security Blog

PDF Integrity Report: March 2026

HTPBE Team·01.04.2026·15 min read

This article is a snapshot — content was accurate as of April 2026. The product evolves actively; specific counts, examples, and detection rules may have changed since publication — see the changelog for the current state.

Every month we look at aggregate, anonymized data from checks processed through the HTPBE? web interface and publish what we find. No file contents, no personally identifiable information — only the structural and metadata signals our algorithm uses to detect modifications.

March 2026: 866 PDFs analyzed through the website, 31 calendar days, and a pace that was more than twice February's.

The Top Line

Metric	Value
Total PDFs analyzed	866
Flagged as modified	420 (48.5%)
Not flagged	446 (51.5%)
Total data volume	522.9 MB
Total pages analyzed	4,274

Two things stand out immediately: the volume, and the rate.

Volume: 866 checks in March versus 418 in February. That is not a small uptick — it is a doubling. More on what drove that below.

Modification rate: 48.5% is the highest we have seen since launching the public interface. February was 40.4%. The month before that was lower still. Something in what people are submitting in March is different, and the data below gives some hints at what.

Confidence Distribution

Confidence level	Count	Share
None (no modification detected)	446	51.5%
Certain (definitive structural evidence)	263	30.4%
High (strong structural evidence)	157	18.1%

The "Certain" category — where the evidence is unambiguous, not probabilistic — accounts for nearly a third of all files submitted. These are documents where multiple forensic signals converge: a date mismatch, an incremental update trail, and a tool-signature inconsistency in the same file.

"High" confidence is the more nuanced finding: 157 files showing strong structural evidence of modification without the full convergence of signals. In a compliance workflow, these are not dismissible. A document that looks clean to a human reviewer but carries high-confidence structural anomalies is the definition of a sophisticated edit.

The total — 420 flagged files — means that if you submitted a PDF through HTPBE? in March, there was roughly a coin-flip’s chance it came back flagged. That is not a comfortable number for anyone relying on document fraud detection as a workflow step.

What the Algorithm Was Looking For

March was the most active month for detection development since the tool launched. Seventeen algorithm updates shipped between March 8 and March 31, adding entirely new detection layers and fixing gaps that had existed since the earliest versions.

The new techniques detected in the wild:

Template-assembly forgeries (Layer 4.5 / 4.6): Documents built by importing pages from independently generated PDFs into a single container — a forgery pattern used to assemble invoices, certificates, and contracts from components with different origins. The structural fingerprint is distinct: font subset prefixes that do not match across pages, or assembly-tool signatures in the producer field. 36 files carried this signal as their primary finding in March, making it the third most common detection in the month.

Scan-replace page forgeries (Layer 4.8): A document is printed, the target page is physically altered, rescanned, and reinserted alongside the original programmatic pages. The structural evidence is a full-page raster image mixed with text-bearing programmatic content, corroborated by a page-import tool signature or incremental update trail. 18 files matched this pattern.

Anti-forensic strip rasterization (Layer 4.8 strip variant): Pages converted into dozens of narrow horizontal image strips — a technique designed to destroy text extractability while preserving visual appearance in a PDF viewer. No legitimate document workflow produces this structure. Detection is automatic — the pattern is unambiguous enough that no corroboration is required. 5 files carried this pattern in March.

Multi-session document assembly (Layer 4.7): Documents assembled from pages rendered in independent sessions, identified by the same typeface appearing with different font subset prefixes across page groups. 7 files matched.

These four patterns account for 66 files — 15.7% of all flagged documents in March. None of them were detectable before March 8.

Modification Signals: What the Evidence Looks Like

Among the 420 flagged files, these were the primary signals:

Primary signal(s)	Files	% of modified
Creation/modification date mismatch	144	34.3%
Incremental updates only	41	9.8%
Design-tool template assembly	36	8.6%
Date mismatch + incremental updates	32	7.6%
Known PDF editing tool detected	22	5.2%
Scan-replace raster pattern	18	4.3%
Date mismatch + template assembly	16	3.8%
Mandatory metadata fields removed	15	3.6%
Creator/producer present, creation date removed	13	3.1%
Soft-mask alpha channel on page images	11	2.6%
Date + incremental + XMP/Info disagreement	11	2.6%
Multi-session page assembly	7	1.7%
Anti-forensic strip rasterization	5	1.2%
Text rendered as vector outlines, fonts absent	4	1.0%

The date mismatch signal — a discrepancy between the embedded creation and modification timestamps — remains the single most common indicator, present in 34.3% of flagged files as the sole signal. Combined with other signals, date mismatch appears in roughly 55% of all flagged files.

The "Mandatory metadata fields removed" finding (15 files) is worth pausing on. A file that has had its creation date or producer field stripped is a file that has been deliberately processed to reduce forensic surface area. Legitimate workflows do not strip metadata from finished documents. When this appears alongside a known editing tool or incremental update trail, it is one of the cleaner indicators of intentional evasion.

Incremental Updates: The Rate Jumped

234 files in March had incremental updates (27.0% of the total). Of those, 196 were also flagged as modified — an 83.8% modification rate among files with incremental updates.

February’s figure was 60.8%. That is a significant shift.

The mechanism has not changed: PDF incremental updates allow appending content after the original write — annotations, revised pages, form data — without rewriting the file. A legitimate chain of three updates might be: original document, digital signature applied, annotation added by reviewer. An illegitimate chain might be: original document, page content replaced, date adjusted.

The jump in modification rate among incremental-update files suggests that the population of such files in March skewed toward the latter. It may also reflect improved detection: several updates shipped in March specifically targeting update-chain analysis, meaning some files that would have been missed in February were caught in March.

The average update chain length was 2.5 revisions, unchanged from February.

Document Origin: A New Layer

March data includes a breakdown by document origin type, made possible by the origin-detection capabilities introduced in late February. This classifies each PDF by the type of tool that produced it.

Origin type	Count	Share
Institutional (server-side tools, enterprise systems)	385	44.5%
Consumer software ("Cannot Verify")	195	22.5%
Legacy (pre-origin-detection records)	170	19.6%
Scanned document ("Cannot Verify")	47	5.4%
Unknown origin	38	4.4%
Online editor ("Cannot Verify")	31	3.6%

Institutional-origin documents — those produced by server-side systems like wkhtmltopdf, iText, PDFlib, Pdftools SDK — make up the plurality at 44.5%. These are the documents where modification detection is most informative: they should be structurally uniform, and anomalies stand out.

Consumer-software documents (22.5%) receive a "Cannot Verify" result rather than a binary modified/intact verdict. Microsoft Word, LibreOffice, Apple Pages, and similar tools produce files with structural characteristics that overlap significantly with legitimate editing workflows, making false positives unacceptably high. The algorithm is conservative here by design.

Scanned documents (5.4%) and online-editor documents (3.6%) also receive "Cannot Verify" — scans because there is no structural text data to analyze, online editors because their output patterns are indistinguishable from certain editing artifacts.

Digital Signatures: Getting Worse

16 PDFs in March carried embedded digital signatures (1.8% of the total). Of those, 7 had been modified after the signature was applied — a 43.75% post-signature modification rate.

February’s figure was 27.3%.

This matters because digital signatures are widely treated as an integrity guarantee. They are not. A PDF signature covers exactly the bytes it covered when it was applied. Incremental updates appended after signing are not covered by the original signature — and the signature remains technically valid because the bytes it originally signed are still present and unaltered. The new content simply sits outside the signed scope.

Seven out of sixteen signed documents in March had been through this process: new content appended after signing, signature still displaying as valid in any PDF viewer that does not explicitly check the signed range. In workflows where a human looks for the green checkmark and moves on, these documents pass.

The Software Ecosystem

Top producers (the application that last wrote the file):

Producer	Files	Share
Microsoft: Print To PDF	59	6.8%
PDFium	52	6.0%
iLovePDF	41	4.7%
iText 2.1.7 by 1T3XT	32	3.7%
LibreOffice 5.1	32	3.7%
Microsoft® Word for Microsoft 365	28	3.2%
Canva	22	2.5%
PDFlib+PDI 8.0.2p1	19	2.2%
Pdftools SDK	16	1.8%
iTextSharp.LGPLv2.Core 3.7.1.0	12	1.4%

Top creators (the original authoring application):

Creator	Files	Share
PDFium	55	6.4%
Microsoft® Word for Microsoft 365	29	3.4%
Microsoft® Word 2016	24	2.8%
Chromium	24	2.8%
Canva	23	2.7%
VCTransaction	19	2.2%
PScript5.dll Version 5.2.2	17	2.0%
Draw	12	1.4%
Dropbox Sign	9	1.0%
wkhtmltopdf 0.12.6.1	8	0.9%

Several patterns worth attention.

iLovePDF nearly quadrupled. February showed 11 files with iLovePDF as producer. March shows 41. The pattern is the same: a document created in Word or Chrome, then processed through an online PDF manipulation service before the creator field gets acknowledged. When iLovePDF appears as producer and Microsoft Word as creator, the document went through an intermediate step that the creator field does not capture. Whether that step was compression, merging, or content editing is what the structural analysis determines.

Canva is now in the top tier. 23 files list Canva as creator, 22 as producer. Canva is a graphic design platform — not the typical authoring environment for a business document. Its presence in both fields means Canva-originated documents are being submitted for fraud detection as contracts, certificates, or financial records. Design tools are powerful enough to produce convincing documents; they are also trivially easy to edit after the fact.

iText and PDFlib+PDI in the producer field. Together, iText 2.1.7 (32 files) and PDFlib+PDI (19 files) account for 51 files that were processed through document manipulation libraries after their original creation. These libraries are used legitimately for merging, watermarking, and signing — but they are also the same tools used for page-import and template-assembly forgeries. Context determines which it is; that is what the structural analysis resolves.

Dropbox Sign appeared as creator in 9 files. These are e-signature contracts being submitted for fraud detection by counterparties — the same use case that drove Upwork’s presence in February. Recipients checking documents generated by third-party platforms before acting on them.

PDF Version Landscape

PDF Version	Files	Share
1.7	340	39.3%
1.4	262	30.3%
1.5	100	11.5%
1.6	85	9.8%
1.3	70	8.1%
1.2	7	0.8%
Other / missing	2	0.2%

PDF 1.7 and 1.4 together account for 69.6% of the sample — the same two-version dominance as February. PDF 2.0 did not appear in a single file this month, despite being available for nearly a decade.

Notable: no files with invalid or unparseable version strings this month, compared to 7 in February. The improved PDF 1.5+ parsing shipped in March likely accounts for part of this.

JavaScript: A First

Two files in March contained embedded JavaScript. February had zero.

This is a small number, but it is the first time JavaScript has appeared in the monthly sample. PDF JavaScript is used for interactive forms and scripting, but it is also a vector for malicious behavior — drive-by execution, data exfiltration, sandbox escapes in older viewers. Files with embedded JavaScript in business documents (contracts, invoices, bank statements) warrant extra scrutiny regardless of modification status.

Document Profile

The average PDF checked in March:

Average size: 0.60 MB
Largest file: 9.71 MB (vs 9.70 MB in February — essentially the same ceiling)
Average page count: 4 pages
Total pages analyzed: 4,274

Metadata completeness averaged 78 out of 100, up slightly from 76 in February. The score measures how many of the eight standard PDF metadata fields are populated.

Missing creation dates affected 141 files (16.3%) — up from 12.7% in February. A document without a creation date has lost one of the cleaner forensic anchors. The March algorithm updates improved detection for files in this state; the "Creator or producer present, creation date removed" signal (13 files) is one result of that improvement.

Daily Volume

Usage in March accelerated week-over-week, with the final week carrying the heaviest load:

Mar 01:  9    Mar 09: 40    Mar 17:  9    Mar 25: 21
Mar 02: 40    Mar 10: 15    Mar 18: 23    Mar 26: 57
Mar 03: 11    Mar 11: 33    Mar 19: 44    Mar 27: 40
Mar 04: 27    Mar 12: 55    Mar 20: 28    Mar 28: 22
Mar 05: 42    Mar 13: 27    Mar 21:  5    Mar 29: 31
Mar 06: 23    Mar 14: 15    Mar 22:  5    Mar 30: 29
Mar 07: 12    Mar 15: 11    Mar 23: 49    Mar 31: 55
Mar 08: 22    Mar 16: 24    Mar 24: 42

The peak day was March 26 with 57 checks. Three days crossed 50: March 12, 26, and 31. The overall daily average was 27.9 checks, up from February’s 14.9 (though February had only 28 days).

The acceleration is visible: the first week averaged 20.6 checks per day; the last week averaged 37.2. No single event explains it — the growth was gradual, which suggests organic expansion rather than a spike from one source.

Other Signals

Embedded files: 3 (less than 1%). PDFs can contain binary attachments — another compliance risk vector in document workflows.

Suspicious tool patterns: Zero files flagged for creator–producer inconsistency. This is partly a detection refinement: the March updates narrowed several signals that had previously fired on legitimate tool combinations (Chrome printing via Skia/PDF, institutional PDFium usage), reducing noise in this category.

Summary

March 2026 by the numbers:

866 PDFs analyzed — more than double February — reflecting growing adoption of document fraud detection workflows
48.5% modification rate, the highest recorded month, up from 40.4% in February
New detection signals found in the wild: template assembly (36 files), scan-replace rasterization (18 files), multi-session assembly (7 files), anti-forensic strip rasterization (5 files) — patterns that were undetectable before March 8
Incremental update modification rate jumped to 83.8%, from 60.8% in February — files with update chains are now overwhelmingly associated with tampering
Digital signatures less protective than ever: 43.75% of signed documents had been modified after signing, up from 27.3%
Canva entered both creator and producer charts — design-tool documents are now a meaningful share of the fraud detection queue
iLovePDF nearly quadrupled as producer (11 → 41), indicating more documents being processed through online manipulation pipelines before submission
Two files with JavaScript — the first time embedded scripting has appeared in the monthly sample
PDF 2.0 adoption: still zero

The 48.5% rate is a milestone, but the more significant number is what drove it: the new detection layers that shipped in March caught patterns that were previously invisible. Those 66 files — template assemblies, scan replacements, strip-rasterized pages — are not newly created frauds. They existed in February too. We just could not see them.

The same detection layers behind these numbers are available to integrate through the PDF tamper detection API.

Data covers all checks submitted through the HTPBE? web interface in March 2026 (UTC). File contents are not stored or analyzed; only structural metadata signals are retained. All figures are aggregate and anonymized.