Invoice Fraud in Accounts Payable: Detect Altered PDFs Before Payment

This article is a snapshot — content was accurate as of June 2026 (code examples tested against the API as of April 2026). The product evolves actively; specific counts, examples, and detection rules may have changed since publication — see the changelog for the current state.
The FBI’s Internet Crime Complaint Center reported over $2.7 billion in losses from business email compromise and invoice fraud in 2023 alone. The money leaves through accounts payable. The mechanism is almost always the same: a PDF invoice that looks correct but was modified after it left the issuing system.
AP automation has made this worse, not better. When humans stop looking at individual invoices, the file structure stops being examined at all.
Three Fraud Vectors, One Common Trail
Invoice fraud reaching AP teams falls into three patterns. All three leave structural evidence in the PDF file.
BEC bank-detail swap. A legitimate invoice from a real vendor is intercepted — through a compromised email account, a spoofed domain, or a hijacked thread. The fraudster opens the PDF, replaces the bank account number in the remittance section, and sends it along. The invoice number is real. The vendor name matches the PO. Only the payment destination is wrong.
The edit leaves two markers: the producer field is overwritten by whatever tool the fraudster used to save the file, and the xref table gains a new incremental update entry. A QuickBooks invoice is not normally re-saved in Microsoft Word in any documented AP workflow — when those two fields disagree, something rewrote the file between issuance and AP. Most of the time that something is benign (a DMS, an e-signature stamp, a mailbox sanitiser). Some of the time it is the fraudster. Structural forensics surfaces the anomaly; deciding which it is belongs to your reviewer or your vendor verification call.
Fabricated supplier invoice. An entirely invented vendor, or a cloned identity of a real one, submits invoices for services not rendered. These documents are typically built from scratch in Word, Excel, or an online template tool. They arrive with no institutional producer signature — no accounting software metadata, no consistent xref structure.
Line-item inflation on a real invoice. A real vendor invoice is modified to increase quantities or unit prices before AP receives it. This follows the same pattern as the BEC swap: incremental update in the xref chain, producer field inconsistency, and a modification timestamp after the creation date.
In each case, the PDF contains a forensic record of what happened.
Why OCR-Based AP Automation Does Not Catch This
Three-way match — purchase order, goods receipt, invoice — is the standard AP control. It catches amount mismatches when the inflated total does not match the PO. It does not catch bank account swaps, because the bank account number is not on the PO.
OCR extraction reads what the PDF displays. It does not examine how the PDF was produced. A fraudster who edits an invoice and saves it produces a file that displays correctly. The extracted fields — vendor name, invoice number, line items, total — all match expectations. The OCR layer has no visibility into the producer field, the xref count, or the modification timestamp.
AP automation platforms — Coupa, Ariba, Tipalti, Bill.com — route and approve based on extracted content. None of them perform structural PDF forensics. The gap between email receipt and payment approval is precisely where invoice fraud lives.
What the API Returns
Here is the HTPBE? response on a BEC-modified invoice. The vendor runs QuickBooks Online. The fraudster intercepted the invoice, changed the bank account in Microsoft Word, and forwarded it through a spoofed email thread:
{
"id": "ck_4f2a9c1e-8b7d-4a3f-b5e2-1d9c6f8a2b4e",
"status": "modified",
"modification_confidence": "high",
"modification_markers": ["PRODUCER_MISMATCH", "INCREMENTAL_UPDATES"],
"creator": "QuickBooks Online",
"producer": "Microsoft Word",
"xref_count": 2,
"has_digital_signature": false,
"creation_date": 1752624000,
"modification_date": 1752796800
}creator: "QuickBooks Online" alongside producer: "Microsoft Word" is structurally inconsistent with the issuing software's normal pipeline — QuickBooks emits its own PDF and the documented vendor workflow does not include a Word re-export. A 48-hour gap between creation_date and modification_date rules out the same-second xref artefacts some renderers produce. Together these are high-confidence signals that the file was opened and re-saved between issuance and arrival.
The verdict is modified with two named markers, and that justifies holding the invoice and verifying out-of-band with the vendor — through a phone number you already have on file, not by replying to the inbound email. It does not justify automatic rejection: a legitimate explanation exists for a small fraction of these (an intermediary accountant who exports through Word to add a cover letter, an aggressive mailbox PDF rewriter), and the small share matters when wrong holds delay good payments and frustrate vendors.
Now consider a fabricated invoice — a document built from scratch in Excel by someone posing as a supplier:
{
"id": "ck_9e3b7f1a-2c5d-4b8e-a6f3-7d0e4c9b1a5f",
"status": "inconclusive",
"modification_markers": [],
"creator": null,
"producer": "Microsoft Excel",
"xref_count": 1,
"has_digital_signature": false,
"creation_date": 1752710400,
"modification_date": null
}This returns inconclusive, not modified — because there is no prior structure to compare against. The document was built entirely in consumer software. inconclusive is not a failure. For an invoice submitted by a vendor your system records as using institutional accounting software, producer: "Microsoft Excel" is a red flag. The document origin is inconsistent with the vendor’s known toolchain.
Routing Logic
Three verdicts, three actions:
intact— file structure consistent with claimed origin. Proceed to PO matching and three-way match as normal.modified— post-creation edit detected. Hold for manual review before payment approval. Do not release funds until a human examines the file and contacts the vendor through an out-of-band channel.inconclusive+ known institutional vendor — document origin inconsistent with vendor profile. Escalate. This pattern covers both fabricated invoices and cases where a vendor genuinely submits documents in unexpected formats (in which case, vendor onboarding data needs updating).
Integration at Invoice Intake
Run the forensic check at the moment the PDF arrives — before PO matching, before three-way match, before any approval workflow begins. A held invoice costs a few minutes of manual review. A released fraudulent payment costs the average BEC loss of $137,132 per incident (FBI IC3, 2023).
The check works via a simple cURL call against any accessible PDF URL:
curl -X POST https://api.htpbe.tech/v1/analyze \
-H "Authorization: Bearer $HTPBE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://your-storage.example.com/invoices/inv-2024-0891.pdf"}'The response includes a check_id that links permanently to the forensic report. Store it against the invoice record. If a payment is disputed, GET /api/v1/result/{check_id} retrieves the full analysis — an immutable audit trail showing which markers triggered the hold and when.
For AP platforms that process email attachments, the pattern is: save attachment to temporary storage, pass the URL to the API, evaluate the verdict before routing the invoice into the approval queue. The same pattern works for portal uploads in Coupa, Ariba, Tipalti, and Bill.com via their webhook or integration layer.
The AP fraud detection solution guide covers vendor risk profiling, producer allowlists by vendor, and integration patterns for common AP platforms.
False Positives in a Real AP Pipeline
A modified verdict in an enterprise AP environment is not a fraud verdict — it is a structural anomaly that requires interpretation against the vendor's known toolchain. Five sources of benign modified and producer drift come up often enough that any production deployment needs to plan for them:
- E-signature platforms (DocuSign, Adobe Sign): adding a signature block is, structurally, an incremental update with a new producer line.
- DMS and ERP middleware (SAP Ariba, Coupa, Tipalti, OpenText, M-Files): re-emit PDFs on ingest with overwritten metadata.
- Mailbox sanitisers and security gateways (Mimecast, Proofpoint, Microsoft Defender): re-encode incoming PDFs to strip active content, leaving new producer fingerprints.
- Print-to-PDF roundtrips on the vendor side: a vendor who prints the invoice, scans it, and re-attaches submits a structurally different file with no content change.
- Cross-platform re-exports: a vendor on macOS opens a Windows-generated invoice in Preview and re-saves to add a comment — different producer, same content.
The right operational posture for the first few weeks is hold and verify out-of-band, never auto-reject. Log every modified and every institutional-vendor inconclusive against the reviewer's ground-truth outcome (real fraud / false positive / out-of-policy-but-legitimate). After several weeks of labelled data, the auto-routing rules tighten themselves; before that, the system runs as a structured signal into human review, not as enforcement.
How This Compares to Existing PDF-Validation Channels
A few existing controls cover adjacent ground; none replace structural forensics for the BEC-style edit:
- PDF digital signatures and S/MIME-signed email: prove the file came from a specific signer and has not been altered since. Few vendors actually sign invoice PDFs, and signed-email coverage in AP inboxes is sparse. Where signatures exist, they are stronger than structural inference — verify them first. Where they do not, structural analysis is the next line.
- ERP-native invoice channels (Ariba Network, Coupa Supplier Network, EDI 810, PEPPOL): bypass the PDF entirely and exchange invoice data over an authenticated channel. These are the right long-term answer; structural forensics is the control for the share of vendors who continue to send PDF attachments over email.
- Standalone PDF validation tools (Acrobat's "compare files", forensic suites): are designed for legal discovery on a per-document basis, not for automatic gating on every inbound invoice at AP scale. They share part of the underlying signal model.
Structural PDF forensics is one layer in an AP fraud prevention stack — the layer that catches the specific attack of editing a legitimate invoice between issuance and AP, which the others above do not cover at machine speed.
What This Cannot Detect
Structural forensics detects modifications to existing documents and inconsistencies in document origin. Two scenarios fall outside its scope.
Fabricated invoices that mimic institutional origin. If a fraudster clones a vendor's QuickBooks account and generates a fraudulent invoice directly through QuickBooks, the file structure will be consistent with a legitimate document. The forensic check returns intact. This attack pattern requires vendor fraud detection through an out-of-band channel — a phone call to a known number, not a reply to the invoice email.
Legitimate vendors using consumer software. Some small vendors genuinely create invoices in Word or Google Docs. For these vendors, inconclusive is the expected result and carries no fraud signal. Vendor onboarding should capture the expected document origin so routing logic applies the signal correctly. Vendors known to use consumer software should not trigger escalation on inconclusive.
Forensic PDF analysis closes the structural gap that OCR and three-way match leave open. It does not replace vendor fraud detection or payment controls — and the controls above (signatures, ERP-native channels, payment-side verification) are complementary, not redundant.
The Cost Case
The average BEC incident reported to the FBI's IC3 ran into the six figures per loss event in 2023. The per-document cost of a structural check is several orders of magnitude lower than that, which makes the math obvious if the layer actually catches a meaningful share of attempts in your specific inbound flow. That is what a sampling exercise — running a back-cohort of disputed and known-good invoices through the API before integration — establishes.
The forensic layer does not close the AP fraud surface; it adds an automated check on a layer that has historically not been examined at AP scale. That is a useful addition, not a complete defence. The other layers — vendor verification, payment-side controls, signed-document channels — still do the work they always did.
For the technical layer behind what the analysis detects, see the invoice tampering detection deep-dive and the how criminals modify invoice PDFs breakdown. The API reference lists the full request/response contract.
Frequently Asked Questions
How can I tell if a vendor invoice was altered before it reached AP?
Three structural signals carry most of the signal: the producer field (which records the software that last saved the file), the cross-reference (xref) count (every save session appends a new entry), and the gap between CreationDate and ModDate. An invoice generated by QuickBooks Online that arrives with producer: "Microsoft Word" and xref_count of 2 was opened and re-saved between issuance and AP. Whether that re-save is a fraudster swapping bank details, a mailbox sanitiser, or a legitimate intermediary needs human follow-up — the structural layer surfaces the anomaly, the reviewer decides.
What is a BEC bank-detail swap?
Business email compromise (BEC) invoice fraud intercepts a legitimate vendor invoice — through a compromised mailbox, spoofed domain, or hijacked thread — then edits the remittance bank account number before forwarding it to AP. The vendor name, invoice number, line items, and total all match the PO. Only the payment destination is wrong. Three-way match cannot see this because the bank account is not on the PO; OCR cannot see it because the displayed content is internally consistent. The edit shows up structurally as producer mismatch plus an incremental update.
Does Coupa, Ariba, Tipalti or Bill.com catch altered invoices?
Not at the structural layer. AP automation platforms route and approve based on extracted content (OCR, three-way match, approval workflows). None of them examine the PDF’s producer chain, xref geometry, or modification timestamps. A forensic check at intake is complementary to those platforms, not a replacement for them.
Can structural forensics catch a fabricated supplier invoice?
Indirectly. A document built from scratch in Word or Excel returns inconclusive, not modified, because there is no prior structure to compare against. For an invoice claimed to be from a vendor whose onboarding record says they use QuickBooks Online or Xero, inconclusive is structurally inconsistent with the claim and warrants vendor verification through an out-of-band channel.