PDF Security Blog

SBA-7a Loan Stip Fraud Detection: Post-PPP Lessons

HTPBE Team·01.07.2026·12 min read

This article is a snapshot — content was accurate as of July 2026 (code examples tested against the API as of May 2026). The product evolves actively; specific counts, examples, and detection rules may have changed since publication — see the changelog for the current state.

The PPP wave was, in retrospect, the largest controlled experiment in small-business stip-doc fraud the lending industry has ever observed. The DOJ has publicly disclosed thousands of prosecutions; the SBA Office of the Inspector General has publicly estimated PPP and EIDL fraud losses in the tens of billions across the program. Whatever the exact number turns out to be after all enforcement and recovery cycles close, two facts are no longer in dispute: small-business document fraud is large, and the dominant attack pattern was not synthetic identity. It was real people uploading altered or fabricated PDFs — bank statements, tax returns, payroll registers, voided checks — built with mainstream consumer tools.

That has reshaped how fintech business lenders and SBA-7a preferred lenders run stip-doc review for the products that came after: 7(a) term loans, EIDL successors, working-capital lines, conventional small-business loans. Across fintech business lenders, SBA-7a preferred lenders, and platforms such as Funding Circle, Lendio, Bluevine, OnDeck, Fundbox, Square Capital, Stripe Capital, Live Oak, Newtek, and Celtic Bank, post-PPP review playbooks have converged on four recurring fraud patterns. This article walks through each one, the structural signals that flag it, and the honest limits of what file-level forensics can and cannot resolve.

Pattern 1 — Altered Business Bank Statements

The most common pattern, and the one closest to the consumer-lending playbook. A borrower downloads a real PDF statement from Wells Fargo Business, Chase Business, Bluevine, Mercury, or Brex, opens it in Adobe Acrobat or an online editor, and changes the figures that matter: average daily balance, ending balance, deposit count, NSF lines.

Structurally the editor leaves the same fingerprints it leaves on consumer statements:

The producer field shifts from the bank’s server-side engine to a consumer editor — Adobe Acrobat, iLovePDF, PDF24, Smallpdf, Preview. Public marker: HTPBE_ONLINE_EDITOR_ORIGIN or HTPBE_EDITING_TOOL_FINGERPRINT.
A second cross-reference layer appears, because every save-after-edit appends a new xref. Public marker: HTPBE_MULTIPLE_REVISION_LAYERS.
The modification timestamp lands hours or days after the declared creation timestamp. Public marker: HTPBE_DATES_DISAGREE.

A statement from Chase Business that was generated by chase.com server-side and re-saved through Acrobat on Tuesday afternoon will carry all three signals. The verdict comes back modified with high confidence. The consumer-side analogue is covered in bank statement fraud in lending, and the broader workflow context is in PDF fraud detection in loan origination.

Caveat that matters in practice: smaller business banking apps and credit unions sometimes export through generic print drivers. A inconclusive verdict on a statement claimed to be from one of the major business banks is itself a flag — route to verification. The same verdict on a small community-bank statement is closer to noise.

Pattern 2 — Fabricated Business Tax Returns

The most loss-prone document class and the one where honesty about scope matters most.

Business tax returns — Form 1120, 1120-S, and 1065 — were the structurally weakest control during PPP. Thousands of borrowers submitted returns that had never been filed with the IRS, because nothing in the upload-and-review workflow ever crossed back to the IRS to confirm filing. Two distinct attack types showed up in the post-loss reviews:

Type A — Edited real returns. The borrower starts from a genuine filed return and uses Acrobat to inflate gross receipts, net income, or owner compensation. This is structurally identical to bank-statement editing and produces the same markers: editor fingerprint, second revision layer, date disagreement. Structural forensics catches this type cleanly.

Type B — Clean rebuilds from scratch. The borrower (or a paid forger) generates a tax return from scratch using a programmatic PDF library — PDFKit, ReportLab, or a one-off Puppeteer template. The output is born synthetic: no editing history, no producer mismatch, no incremental updates. The fields are made up but the file looks pristine. Structural forensics returns inconclusive because the document was generated by a consumer-class toolchain, which is the correct verdict for what the byte layer actually reveals.

The IRS Form 4506-C tax-transcript request is the ground-truth verification for U.S. business tax returns — it pulls the IRS’s own record of what was actually filed and reconciles the borrower-supplied figures against it. In higher-control SBA-7a workflows, 4506-C is often treated as a hard gate on every tax return. The role of structural forensics on this document class is to catch Type A cheaply on day one and to sequence 4506-C ordering more efficiently — the modified files go to the front of the queue; the intact files still need 4506-C but the structural record adds context to the file.

Pattern 3 — Forged Voided Checks and Banking Attestations

A small document class with disproportionate downside. The voided check or bank-letter attestation supplied at funding determines which routing and account number the loan proceeds get wired to. A successful swap at this stage moves the money to an account the borrower controls but the lender has never seen mentioned in the application.

Two attack flavours:

Edited voided check. Borrower opens a screenshot or PDF of their own check in Acrobat and overwrites the routing or account digits. Structural signals fire as on any edited document: editor producer, incremental update layer, glyph-level edits if individual digits were replaced (HTPBE_GLYPH_LEVEL_EDIT, HTPBE_CHARACTER_OVERLAY_EDIT).
Rebuilt voided check. Borrower generates the image in any drawing tool, exports to PDF. Born-synthetic, returns inconclusive for the same reason as tax returns above.

The compensating control here is operational, not structural: a callback to the bank using a phone number from an independent source — the bank’s public website, not a number printed on the document. Structural forensics catches the editor-altered version on day one and reduces the queue that needs callback verification. It does not replace the callback for a born-synthetic rebuild.

Pattern 4 — PPP Forgiveness Applications (Retrospective)

This pattern is unusual because it is backward-looking, but it is now a live workstream at multiple SBA-7a lenders. The SBA Office of the Inspector General continues to audit PPP forgiveness decisions, and lenders defending those audits need to demonstrate the documentary basis on which forgiveness was approved.

For lenders that kept the original PDF application files, running structural forensics on those files now produces an audit-trail artifact: a persistent check_id, the verdict at the time of analysis, the markers present, the producer string, the timestamp layers. If a particular forgiveness file later becomes the subject of an OIG question, the lender has a contemporaneous structural record alongside the underwriter’s notes.

This is not a fraud-detection use case in the live-pipeline sense — the loans are already funded and forgiven. It is an audit-defence and discovery use case. Most of the lenders running this work batch-process the historical application files through the API once, store the check_id against the loan record, and surface it on demand when an OIG inquiry lands.

Where the Check Fits in an SBA-7a Stip Workflow

SBA-7a and conventional small-business loan files move on a 30–60 day clock. Adding a 1–4 second per-document structural check does not move the critical path. The integration points that have worked in production:

At stip-doc receipt — every uploaded business bank statement, tax return, voided check, and payroll register is sent to the API immediately. The verdict and markers attach to the document record in the LOS. modified files route to a fraud-ops queue before the credit decision; inconclusive files route based on what was claimed to be uploaded; intact files proceed.
Before funding wire — the voided check or bank-letter on file is re-checked at funding. This is the last point at which an account number could have been swapped between underwriting and disbursement.
Before SBA-7a guarantee package finalisation — for 7(a) loans, the package submitted to SBA for the guarantee includes the underlying stip docs. Running the check immediately before package assembly ensures the documents in the guarantee file match the structural record from intake.

A minimal integration call:

curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer $HTPBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://files.example-lender.com/stips/1120-2025-borrower-12345.pdf",
    "tool": "sba-7a-stip-review"
  }'

The response carries a verdict (intact, modified, or inconclusive), public marker IDs, and a persistent check_id that becomes the audit-trail anchor.

Calibrating the Queue

Public datasets on small-business document tamper rates are thin, and lenders that have measured internally rarely publish. As planning assumptions — not benchmarks — lenders building a queue capacity model have used these ranges:

Subprime business lending (MCAs, alternative term loans, high-risk EIDL successors): plan for roughly 5–10% of stip docs to return modified.
Prime small-business lending (Live Oak / Celtic / Newtek SBA-7a books, conventional small-business term loans at major banks): plan for roughly 1–2% of stip docs to return modified.
inconclusive rates depend on the document mix — expect higher rates on tax returns (more rebuild attacks, more legitimate consumer-tool exports from accountants) than on bank statements from major business-banking platforms.

Treat these as planning numbers. The real distribution at your shop is a function of channel, broker mix, product, and ticket size. Re-measure after the first quarter of live data.

What Structural Forensics Cannot Do

Stated plainly so it does not have to be guessed at:

Clean rebuilds return inconclusive, not modified. A tax return generated from scratch in PDFKit, a voided check rebuilt in a drawing tool, a bank statement assembled in a templated forgery service — none of these will be caught by structural-byte analysis. They will be caught by 4506-C (for tax returns), by bank callback (for routing numbers), and by document-content rules at other vendors.
intact does not mean the figures are real. It means the document was not modified after creation. A real bank statement with real fraudulent transactions inside it (kiting, structured deposits) is structurally intact. Behavioural fraud at the transaction layer is not the file layer.
inconclusive is not a verdict against the borrower. Many legitimate small-business documents return inconclusive — accountants exporting from Drake or Lacerte through generic print drivers, bookkeepers re-saving QuickBooks reports, small banks that use consumer-class PDF pipelines. The right action on inconclusive is calibrated escalation based on what was claimed to be uploaded, not auto-decline.

Audit Trail and Loss-Cause Attribution

Every analyze call returns a persistent check_id queryable through GET /api/v1/result/{check_id}. Storing it against the loan record adds two operational signals beyond the live fraud-screening use case (the retrospective PPP-forgiveness pattern above is the third):

Loss-cause attribution. When a loan defaults and the post-mortem asks “were the documents real,” the structural record from intake answers part of that question alongside underwriter notes and the 4506-C on file.
Broker performance review. Aggregated by submitting broker, the modified and inconclusive rates surface which channels are sending higher-risk paper.

Integration documentation lives at /api; pricing scales with monthly check volume.

Who This Is For

This article is written for the people who actually own this decision:

Head of Credit Risk at a fintech small-business lender deciding whether to add a structural-forensics layer on top of an existing KYB + bank-data + 4506-C stack.
Director of Fraud Operations at an SBA-7a preferred lender building a stip-doc review playbook that has to defend audit positions years after origination.
VP Origination at a regional bank with a small-business book asking what the post-PPP review cycles actually changed about the way loan files should be screened.

The same four attack patterns surface across adjacent SMB lending verticals — merchant cash advance and revenue-based finance underwriting (where bank statements are the primary signal), equipment finance (where forged voided checks at funding are the high-leverage attack), invoice factoring (where altered invoices and aging schedules play the role tax returns play here), and conventional non-SBA term loans at regional banks. The 4506-C control is especially central in SBA-7a tax-return review; the structural-forensics layer is not SBA-specific. The end-to-end view of how the same filter fits across these flows is in the document fraud detection fintech workflow.

FAQ

How does structural forensics differ from a 4506-C tax-transcript pull? 4506-C asks the IRS what the borrower actually filed; structural forensics asks whether the PDF supplied to the lender was edited after it was generated. They answer different questions. 4506-C is the ground-truth control for tax-return content; structural forensics catches the edited-real-return subset cheaply on day one and lets you sequence 4506-C orders more efficiently. Use both for tax returns; structural-only is often sufficient as a first-pass screen for bank statements, where the bank’s own portal is the institutional reference and structural signals reliably identify post-portal editing.

Will this slow down our 30–60 day SBA-7a clock? A typical analyze call returns in 1–4 seconds. At stip-doc receipt the result is back before the document has been routed to a human reviewer. There is no measurable impact on the funding clock.

What happens on a clean PDFKit rebuild of a forged tax return? The verdict is inconclusive. The document was generated by a consumer-class toolchain, so structural integrity cannot be evaluated against an institutional baseline. The right downstream action is 4506-C verification, which is the ground-truth control for whether the return was filed at all.

Is the historical-PPP audit-defence use case actually useful, given the loans are closed? For lenders defending active OIG inquiries, yes — the structural record from the original application files becomes a contemporaneous artifact in the audit response. For lenders with closed and clean books, it is optional. The batch-process cost is one-time and small relative to even a single OIG dispute.