PDF Security Blog

KYC vs Document Forensics: Why KYC Platforms Miss PDF Fraud

HTPBE Team·05.05.2026·11 min read

This article is a snapshot — content was accurate as of May 2026 (code examples tested against the API as of April 2026). The product evolves actively; specific counts, examples, and detection rules may have changed since publication — see the changelog for the current state.

Bank statement fraud is the most common fraudulent document type in lending. Inscribe’s 2025 fraud report puts it at 59% of all fraudulent documents detected across fintech and lending platforms. Most of those platforms already use Persona, Onfido, or an equivalent KYC provider. And yet bank statement fraud detection remains a persistent gap in most document review stacks.

The fraud still happens.

This is not a failure of KYC. KYC platforms do exactly what they were designed to do. The problem is that the industry has treated document fraud as a single problem solvable with a single tool — when it is actually two distinct problems requiring two separate layers.

What KYC Platforms Are Actually Checking

KYC platforms — Persona, Onfido, Jumio, iDenfy, Ondato — are built to answer one question: is this person who they claim to be?

That question involves several sub-checks that these platforms have refined over years:

Liveness detection — Is there a real human in front of the camera, not a photograph or video replay?
Face match — Does the live face match the face on the submitted ID document?
Identity document template analysis — Does this passport, driver’s licence, or national ID conform to the layout, fonts, security features, and design patterns of the issuing jurisdiction?
Sanctions and watchlist screening — Does this identity appear on AML or sanctions lists?
OCR field extraction — Do the extracted name, date of birth, and document number match the applicant’s stated details?

These checks are sophisticated and well-executed. When Persona clears an applicant, it means: this appears to be a real person, the identity document looks authentic, and the person in front of the camera matches the document.

That is a meaningful, valuable answer. It is also an incomplete one.

Why KYC Platforms Miss PDF Fraud

There is a second question that KYC platforms are not designed to address: was this specific PDF file modified after it was originally created? This is the core reason KYC platforms miss PDF fraud at scale — not because of a defect, but because of scope.

Those two questions sound related. They are structurally different.

Consider the standard bank statement fraud pattern. An applicant downloads their real bank statement from their bank’s online portal as a PDF. The template is 100% legitimate. The account number is real. The bank logo and formatting are correct. Their name is on it.

They open the file in a PDF editor or export it through Microsoft Excel. They change the balance from $3,200 to $32,000. They re-export it as a PDF and upload it to your application portal.

When Persona or Onfido inspects this document, what do they see? A bank statement with a real template, real branding, and a name that matches the applicant. The visual checks pass. The field extraction picks up the name and address. Cleared.

What they do not see — because no KYC platform reads it — is the file’s internal structure. The Producer field in the PDF metadata now reads “Microsoft Excel.” The creation timestamp was reset to the moment of export. The structural fingerprint that a genuine bank-generated PDF carries is absent. The file has declared its own provenance, in plain text, inside the binary. No one in the KYC layer reads this.

This is not a flaw in Persona or Onfido. It is a scope boundary. OCR reads what is rendered on screen. Forensic analysis reads what is encoded in the file structure. These are different things.

The Structural PDF Layer KYC Cannot See

A legitimate bank statement is generated by core banking software — systems built on platforms like SAP, Oracle, or Temenos, or the bank’s own document generation engine. When those systems produce a PDF, the file carries a consistent set of internal signals:

A Producer field identifying the PDF generation library (e.g., iText, Aspose.PDF, Oracle BI Publisher)
A Creator field identifying the application
A CreationDate timestamp consistent with the account period
A cross-reference table structure consistent with programmatic single-pass generation
No incremental update chain — bank statements are generated once, not edited

When a person opens that file in Excel and re-exports it, every one of those signals changes. The file is structurally a new document. The original institutional fingerprint is replaced by a consumer software fingerprint.

This is what document forensics reads. Not what the page looks like. What the file is.

Bank Statement Fraud Detection: What the File Structure Reveals

Case 1: Edited in Excel and Re-exported

Here is a realistic HTPBE? response for a bank statement that was downloaded from a bank portal, edited in Excel, and re-uploaded.

{
  "id": "f2c1a890-3d47-11ef-b456-426614174000",
  "status": "inconclusive",
  "status_reason": "consumer_software_origin",
  "producer": "Microsoft Excel",
  "creator": null,
  "creation_date": 1771060931,
  "modification_date": 1771060931,
  "origin": {
    "type": "consumer_software",
    "software": "Microsoft Excel"
  },
  "xref_count": 1,
  "has_incremental_updates": false,
  "has_digital_signature": false,
  "modification_markers": []
}

The verdict is inconclusive — not modified — because HTPBE? cannot prove what the original values were. What it can prove is unambiguous: this file was produced by Microsoft Excel. No core banking system uses Excel to generate customer statements. The two facts are mutually exclusive. For any document presented as a bank statement, this signal is operationally equivalent to modified.

Case 2: Edited in a PDF Editor with Incremental Updates

Now compare that to a more direct attack — an applicant who opened their genuine bank statement in a PDF editor, made targeted changes, and saved it using incremental updates:

{
  "id": "a3b2c890-4e58-12fg-c567-537725285111",
  "status": "modified",
  "modification_confidence": "high",
  "producer": "Adobe Acrobat Pro 2024",
  "creator": "Adobe Acrobat Pro 2024",
  "creation_date": 1769000000,
  "modification_date": 1771060931,
  "origin": {
    "type": "consumer_software",
    "software": "Adobe Acrobat Pro"
  },
  "xref_count": 3,
  "has_incremental_updates": true,
  "modification_markers": ["HTPBE_MULTIPLE_REVISION_LAYERS", "HTPBE_DATES_DISAGREE"]
}

The structural evidence is direct: three cross-reference table entries mean three distinct save events after the original creation. The timestamp gap between creation_date and modification_date is flagged. The verdict is modified with modification_confidence: "high".

A KYC visual inspection sees a bank statement that looks identical to a legitimate one. HTPBE? reads the file’s edit history.

What “Inconclusive” Means for Fintech Lending Teams

The inconclusive verdict is operationally significant for fraud operations teams. It does not mean the tool could not decide. It means the document’s origin cannot be confirmed as institutional.

For most consumer-facing document types, inconclusive is a neutral outcome — consumer software produces many legitimate documents. For bank statements, pay stubs, and formal financial certificates, inconclusive is a flag.

A bank statement produced by Microsoft Excel is not a bank statement. It is a spreadsheet formatted to look like one. The correct response is not to approve the application pending further review. The correct response is to request the statement through an alternative channel — Open Banking API, direct bank portal login, or a secure document request — and not to ask the applicant to re-submit the PDF.

The policy is straightforward: if you expected a bank-generated PDF and received status_reason: "consumer_software_origin", treat it the same as modified.

KYC vs Document Forensics: The Right Architecture Is Both

The question is not “KYC or document forensics?” It is how to sequence them.

These tools solve different problems and can run in parallel or in sequence depending on your workflow. Here is the model that eliminates both fraud vectors:

Layer	Tool	Question answered
Identity fraud detection	KYC platform (Persona, Onfido, etc.)	Is this person real and who they claim to be?
Document visual inspection	KYC platform	Does this document look like a legitimate template?
File-level integrity analysis	HTPBE?	Was this PDF file modified or consumer-produced?

Neither layer is redundant. KYC catches identity fraud and template forgery. HTPBE? catches file-level modification and consumer-software origin. A fraudulent bank statement that passes KYC (because the template is real) fails HTPBE? (because the file structure is wrong). A fabricated identity document that passes visual inspection but uses a stolen identity fails KYC liveness checks. Each tool catches what the other does not.

The financial case is clear. KYC platforms cost between $0.50 and $5.00 per fraud detection. At volume, running full KYC first and HTPBE? second is the logical sequence. For teams where the primary risk is financial document fraud rather than identity fraud, running HTPBE? as an inexpensive first filter and escalating flagged documents to human review before KYC costs are incurred is a reasonable alternative.

At 300 applications per month with one bank statement each:

Configuration	Monthly cost
KYC only ($1.50/check × 300)	$450
HTPBE? first filter + KYC for non-flagged 80% ($1.50 × 240)	$509
HTPBE? first filter + KYC for non-flagged 80% + no manual review on flagged 20%	$509, with 60 flagged cases reviewed rather than funded

The cost difference is marginal. The risk difference is not. Sixty fraudulent applications that reach funding decisions carry average exposure of $250,000–$500,000 each in consumer lending. HTPBE? does not prevent all fraud. It closes the specific structural gap that bank statement fraud exploits — the gap that accounts for the majority of document fraud volume.

Integrating PDF Forensics via API: Where It Fits in Your Stack

The integration point is document intake. After an applicant uploads a supporting document — bank statement, pay stub, income letter, tax return — and before that document enters underwriting review, send the document URL to the PDF forensics API.

curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-storage.com/applicant-docs/bank-statement.pdf"}'

The response arrives in 2–5 seconds. The routing logic is one conditional branch:

status: "modified" → reject or route to manual fraud review
status_reason: "consumer_software_origin" → request alternative document sourcing
status: "intact" → pass to standard underwriting queue

This is not a replacement for your existing KYC flow. It runs alongside it. The KYC layer confirms the person. HTPBE? confirms the document.

What This Approach Does Not Catch

Document forensics is not a complete fraud detection system. It catches modifications that leave structural traces — which covers the majority of consumer-tool fraud. It does not catch:

Fabricated documents built from scratch — If a fraudster builds a fake bank statement from a blank InDesign template rather than editing a real one, there is no original document structure to diverge from. KYC template analysis and Open Banking data sourcing are the correct tools here.
High-end professional forgeries — A sophisticated attacker using the same PDF generation tools a bank uses, with accurate metadata, can produce a file that appears structurally legitimate. This attack requires significant technical capability and is rare in consumer lending fraud.
Scanned and re-printed documents — A printed and re-scanned document loses all original metadata. HTPBE? reports origin.type: scanned, which is a useful signal when a document should not be a scan, but the tool cannot determine what was altered before printing.

These limits are why the layered architecture matters. No single tool covers all fraud vectors. The structural forensics layer closes the specific gap that volume lending fraud exploits — the edited-PDF-that-passes-visual-inspection gap — at a cost low enough to run on every document.

Who Should Read This

This article is for Heads of Risk and Fraud Ops at alternative lenders, BNPL platforms, mortgage originators, and fintech companies that process income and financial supporting documents at scale. If your current document review process relies on KYC visual inspection plus human analyst review, and you are seeing fraud losses on bank statements and pay stubs that your KYC provider is not catching, the structural forensics layer is the gap to address.

HTPBE? integrates in under 30 minutes with no enterprise sales process. The Growth plan covers 350 checks per month — enough for most teams processing 200–300 applications monthly. For platforms handling higher volume, see the Pro and Enterprise tiers.

If you are building a broader document fraud prevention workflow for a fintech or lending context, the KYC onboarding and fintech lending use-case pages cover integration patterns and decision thresholds for each document type in detail.