PDF Security Blog

Bank Statement Fraud in Lending: What Gets Altered and How to Catch It

HTPBE Team··11 min read
Bank Statement Fraud in Lending: What Gets Altered and How to Catch It

This article is a snapshot — content was accurate as of May 2026 (code examples tested against the API as of April 2026). The product evolves actively; specific counts, examples, and detection rules may have changed since publication — see the changelog for the current state.

A borrower submits a three-month bank statement PDF. The layout matches their bank’s template exactly. The running balance climbs steadily through the period, peaking just above the income threshold required for approval. There are no overdrafts. The font is correct. The logo renders cleanly. Your underwriter approves the file.

None of that means the document was unaltered.

According to fraud analytics firm SEON, bank statements are the most commonly falsified document in lending applications — cited in over 59% of fraudulent loan applications. The editing happens after the borrower downloads the legitimate PDF from their bank portal, before they upload it to your application form. The tools required cost nothing and require no technical skill.

What Borrowers Actually Change

The three most common alterations are running balance inflation, inserted deposits, and hidden overdrafts.

Running balance inflation is the simplest. The borrower opens the PDF in Adobe Acrobat, selects the balance figures, and types over them. No arithmetic is recalculated — the inserted numbers are static text objects with no relationship to the surrounding transaction data. This is invisible to visual review but immediately detectable to any system that checks whether the edit session was recorded in the file structure.

Inserted deposits involve adding a transaction line — typically a salary credit or a one-off transfer — to push the average monthly income above a threshold. Again, Adobe Acrobat or an online editor like iLovePDF or Smallpdf handles this in seconds. The inserted line sits alongside real transactions in the visible content layer, but the edit leaves a different fingerprint in the structural layer.

Hidden overdrafts are less common but more technically deliberate. A borrower who has run negative balances removes those rows or replaces negative figures with small positives. This can involve reformatting an entire section, and consumer PDF editors typically re-embed fonts or rewrite object streams in ways that differ structurally from the bank’s original rendering pipeline.

Why Visual Review Fails

A reviewer comparing a statement on screen cannot see the xref chain, the producer field shift between bank portal and applicant upload, or the modification timestamp arriving four days after the creation timestamp. Visual review catches sloppy fraud: wrong font, misaligned columns, a logo that renders at the wrong resolution. It does not catch competent fraud performed with mainstream consumer tools.

The structural evidence of an edit lives in the file’s metadata and revision history, not on the rendered page.

What Structural Forensics Actually Catches

Three structural signals cover the large majority of bank statement fraud attempts.

The xref incremental update trail. Every time a PDF is modified and saved, the changes are appended to the file and a new cross-reference table is written. A bank statement generated by a bank portal and saved without intermediate processing typically has one xref entry. The exceptions matter: some retail banks save incrementally on first export, mail-handling gateways and DMS pipelines can rewrite metadata in transit, and mobile banking apps occasionally re-save the PDF on share. So xref_count > 1 is a strong correlate of post-creation modification, not a deterministic marker — it earns a closer look, not an automatic rejection.

Producer mismatch. The creator field identifies the software that originally produced the document. The producer field identifies the software that last saved it. A bank’s web portal typically generates PDFs using server-side engines: iText, Aspose.PDF, PrinceXML, or similar. When the producer field shows Adobe Acrobat 24.2 or iLovePDF while the creator shows Temenos or FIS, the document was re-saved in a tool that is not part of any documented bank statement pipeline. PRODUCER_MISMATCH in the API response is a high-confidence anomaly — the right action is to flag for review, not auto-reject, because the customer might have opened the file in Preview to compress before email.

Modification date gap. The PDF specification requires that generators set both a CreationDate and a ModDate. On a clean bank export these timestamps are usually equal or within seconds of each other. On an edited document the gap reveals when the editing occurred — a statement dated Monday with a modification timestamp of Thursday means someone opened and saved the file in between. The size of the gap, the day of the week, and the relationship to the application submission date all become usable signals.

What an API Response Looks Like

Here is a representative HTPBE? response on a bank statement where the borrower used iLovePDF to remove two overdraft entries and adjust the closing balance:

{
  "id": "ck_4e2f9a1b-7c3d-4f8e-b5a2-9d1e6c0f8b4a",
  "status": "modified",
  "modification_confidence": "high",
  "modification_markers": ["PRODUCER_MISMATCH", "INCREMENTAL_UPDATES"],
  "creator": "Temenos T24",
  "producer": "iLovePDF",
  "xref_count": 3,
  "has_digital_signature": false,
  "creation_date": 1751760000,
  "modification_date": 1752105600
}

creator: "Temenos T24" is a core banking platform used by mid-tier banks across Europe and Australia. producer: "iLovePDF" is a consumer online PDF editor. iLovePDF, PDF24, Smallpdf, and similar consumer editors are not part of any documented bank statement pipeline — their producer string in a bank statement is a high-confidence anomaly. The xref_count of 3 indicates three write sessions: the original generation and two subsequent saves. The modification timestamp trails the creation timestamp by four days.

The verdict is modified with modification_confidence: "high". That gives an underwriter the signals to decide whether to route for manual review, request a fresh download from the bank portal, or proceed.

The inconclusive Signal in Lending Context

Not every bank statement returns modified or intact. Some legitimate statements — particularly from smaller credit unions or non-standard banking apps — are generated in consumer software or exported via generic PDF print drivers, and return inconclusive. That is not a failure verdict: it means the document was produced in software that does not identify itself as an institutional banking system, so structural integrity cannot be checked against a known baseline.

In a lending context the meaning depends on the claimed institution. Major retail banks — Chase, Barclays, Commonwealth Bank, TD — all generate statements from server-side institutional PDF engines, so inconclusive on a statement claimed to be from one of them is itself a signal worth escalating. The same verdict on a statement from a smaller institution where consumer-style export is normal can be treated as routine.

Where It Fits in Your Underwriting Stack

Plaid and similar open banking connectors pull transaction data directly from the bank’s API. They are the gold standard for income source-of-truth checks when the applicant has an account at a supported institution. They bypass the document layer entirely — and that is also their limit. Plaid supports a fraction of global financial institutions, and BNPL and MCA lenders frequently serve applicants whose banks are not connected, who have international income, or who hold multiple accounts. Those applicants submit PDF statements because open banking cannot reach them.

Identity fraud platforms like Persona and Alloy confirm that the person submitting the application is who they claim to be — face matching, ID document checks, watchlist screening. They do not analyze the structural integrity of submitted financial documents.

The structural PDF forensics for alternative lenders layer fills the gap between document submission and open banking coverage. It does not replace Plaid when Plaid is available — it operates on the submitted PDF regardless of whether the borrower’s institution is connected to any open banking network.

For a full breakdown of how document fraud shows up in income source-of-truth check workflows, see the bank statement fraud detection guide.

Integrating the Check at Application Intake

The check runs at document upload. Before the statement reaches income parsing, before a human underwriter touches it, before a credit model sees any figures from it.

curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer $HTPBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-storage.example.com/statements/applicant-7821.pdf"}'

The response is synchronous for most documents. For large statements, poll GET /api/v1/result/{id} — typical analysis time is under three seconds. Store the check_id against the application record. If the credit decision is later disputed, the forensic report is retrievable as a permanent audit trail showing exactly which structural signals triggered the hold.

The full API reference and test scenarios are available at the self-serve API.

False Positives and Calibration

The harder question for lending is not "how many forgeries does this catch" but "how often does it flag a clean document". A high false-positive rate is expensive: delayed approvals, customer friction, and a swelling manual-review queue. Calibration matters more than any individual forensic example.

Real sources of false positives on bank statements include:

  • A retail bank that saves incrementally on its first export, producing xref_count > 1 on an otherwise untouched file.
  • A mail or document gateway that adds its own metadata layer at delivery, shifting the producer field away from the original generator.
  • A customer who downloads the PDF, opens it in Preview or Adobe Reader to compress for email, and re-saves before uploading.
  • A statement generated through a third-party aggregator pipeline — Plaid's downstream PDF render, a Yodlee export — where the producer field is the aggregator, not the bank.
  • DMS or ECM pre-processing on the borrower's accountant's side, common for self-employed applicants who route financial documents through bookkeeping software.

None of these are fraud, but each can produce a single modification marker in isolation. The calibration approach is to combine signals rather than treat any one as a verdict. A modified outcome with one marker on a re-compressed file is a different population from modified with PRODUCER_MISMATCH plus xref_count = 3 plus a four-day ModDate gap. Underwriting policy should weight the full response — status, modification_confidence, modification_markers, the producer string itself, and the claimed institution — rather than firing on marker presence alone. A practical default: manual review on inconclusive for institutions with a known institutional baseline, and on modified only when confidence is high or two or more markers are present.

What This Does Not Catch

Structural forensics detects modifications to existing documents. Two scenarios fall outside its scope.

Statements fabricated from scratch in the same software the bank uses. If a borrower somehow builds a document using the exact same PDF generation stack as their bank — same library, same parameters — the structural layer will appear consistent. This attack requires technical knowledge most borrowers do not have, and is far less common than editing an existing export. For this pattern, cross-referencing with open banking data or direct account fraud detection remains the correct control.

Statements generated in consumer software legitimately. Some smaller institutions and fintech neobanks generate statements via generic export tools, mobile print drivers, or third-party aggregators. For these, inconclusive is the expected and correct verdict. The signal is only meaningful when the claimed institution has a known institutional PDF generation profile.

Understanding these limits matters. The forensics layer is not a fraud oracle — it is a structural signal that increases confidence and reduces the human review burden on documents that carry structural evidence of alteration.

For pay stub fraud detection in the same workflow, the patterns are similar — see the fake pay stub detection guide for how income document fraud across document types can be detected with the same API call. For the broader stack view, the KYC vs. document forensics breakdown covers where this layer sits alongside identity checks.

Frequently Asked Questions

How can I tell if a bank statement PDF is fake?

Visual review catches sloppy fakes — wrong fonts, misaligned columns, implausible totals. Competent edits made with Adobe Acrobat or online editors look identical to the original on screen. The reliable signal lives in the file structure: the producer field, the cross-reference (xref) chain, and the gap between CreationDate and ModDate. A statement with producer: "iLovePDF" on a major retail bank’s document is a structural anomaly worth escalating.

What is producer mismatch in PDF forensics?

The producer field identifies the software that last saved a PDF. The creator field identifies the software that originally generated it. When a bank emits a statement through its server-side engine (Temenos, Aspose, FIS) and the file later carries producer: "Microsoft Excel" or an online editor signature, those two fields disagree in a way no documented bank distribution pipeline produces. That is producer mismatch.

Can structural forensics catch a fabricated bank statement?

Partially. A statement built from scratch in Word or Excel returns inconclusive, not modified — there is no prior structure to compare against. For a document claiming to be from a top-tier retail bank, inconclusive is itself a signal: real bank portals do not emit PDFs with a consumer-software producer. Fabrication from inside the bank’s own infrastructure (rare, requires platform access) is out of structural scope and needs an open-banking or out-of-band check.

Does Plaid replace bank statement forensics?

No — they cover different applicants. Plaid pulls transaction data directly when the borrower’s institution is connected and the borrower consents. BNPL, MCA, and lenders serving international or unsupported institutions still receive PDF statements, and that is where structural forensics applies. The two layers are complementary.

Share This Article

Found this article helpful? Share it with others to spread knowledge about PDF security and fraud detection.

https://htpbe.tech/blog/bank-statement-fraud-in-lending

Secure your workflow

Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.