PDF Security Blog

Tenant Screening SaaS: Bank Statement Fraud Detection

HTPBE Team·22.06.2026·14 min read

This article is a snapshot — content was accurate as of June 2026 (code examples tested against the API as of May 2026). The product evolves actively; specific counts, examples, and detection rules may have changed since publication — see the changelog for the current state.

Snappt reportedly raised over $100 million (per public funding announcements) on a thesis that tenant screening platforms were shipping recommend/decline decisions on applicant-uploaded PDFs with limited ability to inspect the structure of the uploaded documents themselves. The size of that raise is one indicator that the market treated the problem as real and unsolved. For screening platforms that didn’t build their own document-fraud layer in time, the gap is still open.

If you run product or trust-and-safety at a tenant screening SaaS — the category includes platforms like TransUnion SmartMove, RentSpree, Findigs, Latchel, Stessa, and Buildium, among many others — the unit economics of a missed-fraud decision are not yours to absorb directly, but they are yours to indemnify against. A landlord who relies on your recommendation, signs the lease, and then loses several months of rent plus the cost of eviction does not call the applicant. They call you, and then their lawyer calls you.

This piece is about where structural PDF forensics fits into a tenant screening pipeline that already has OCR, Plaid Income, employer verification, and credit pulls — and why it is one of the cheapest signals to add per applicant.

The Unit Economics of a Bad Recommend

A typical tenant screening SaaS charges the landlord between $35 and $75 per applicant. Gross margin on that fee is high, but the indemnity exposure is asymmetric. One missed-fraud case wipes out the gross margin on hundreds of clean screens.

The exposure stacks in three directions:

Direct liability claims. Most screening contracts cap liability at fees paid, but state consumer-protection statutes and individual landlord agreements have, in reported cases, produced settlements above the cap.
Churn from a single bad outcome. A property-management company managing 600 units that loses $40k to a fraudulent applicant your platform cleared is unlikely to renew, and is likely to share the experience with peers in their network.
Fair-housing exposure on the other side. Tightening the model to reject more borderline cases pushes false-decline rates up. Disparate-impact scrutiny — from HUD, state AGs, and private plaintiffs — has been a recurring theme in industry guidance over recent years.

Platforms that addressed the first two problems early — Snappt is the most visible example — did it by adding a document-fraud layer between applicant upload and decision. They built it as a vertical product because there was little horizontal infrastructure available at the time. That has since changed.

Where Each Existing Layer Catches Fraud (and Where It Misses)

Most tenant screening stacks today layer four or five signals. They overlap, but each leaves a specific gap.

OCR + business rules — pulling line items off a bank statement PDF and validating arithmetic, employer names, deposit cadence. Catches lazy fraud where the totals no longer add up after a balance edit. Misses careful fraud where the fraudster fixed the arithmetic, and misses fabricated statements from generator sites that are arithmetically consistent by construction.

Plaid Income / Plaid Asset Report — cryptographic proof of bank-side reality, by far the strongest signal when the applicant opts in. The problem is the opt-in rate. Applicants who are committing fraud rarely connect their real account; they decline Plaid and revert to PDF upload. Headline connect-rate figures above 60% should be read carefully — some are computed over applicants who got past the income gate rather than the full top-of-funnel population, which inflates the apparent coverage.

The Work Number (Equifax) / employer verification APIs — ground truth on employment and salary for applicants whose employer is in the database. Coverage skews to W-2 employees at mid- and large-cap employers. Self-employed applicants, gig workers, and small-employer staff fall outside it, which is exactly the demographic most prone to inflate income on a PDF.

Visual-AI systems such as Snappt — computer-vision models designed to flag pixel-level inconsistencies, font swaps, and layout anomalies in bank statement images. As a category, these are designed for visually obvious edits, and category-level limitations include documents edited in a vector editor with consistent rasterisation, screenshots laundered through phone-to-PDF apps, and born-synthetic documents generated by tools that never touch a raster surface. Specific vendor capabilities vary and only the vendor can speak to their own coverage; this is a general description of the class.

Credit pull and identity verification — orthogonal to document fraud. Verifies the applicant is who they say they are, not that their income proof is genuine.

The remaining gap is the one structural PDF forensics fills directly: documents that look right pixel-by-pixel, parse cleanly through OCR, and present arithmetic that adds up — but whose internal byte structure shows that the file was generated by a consumer editor and not by the institutional bank portal it claims to be from. The same byte-layer signals are what catches bank statement fraud in lending and the KYC PDF blind spot on the lender side.

What Structural PDF Forensics Adds to the Tenant Screening Stack

A bank statement that comes out of Chase, Wells Fargo, Bank of America, or a major neobank’s PDF-export pipeline has a recognisable internal structure. The producer field identifies the bank’s server-side PDF generator. The xref table has a specific shape. The metadata timestamps are internally consistent. There is no editing history.

When an applicant downloads that PDF, opens it in Excel, Adobe Acrobat, iLovePDF, or one of the dozens of online editors, and saves it again, the file structure changes in ways the visual surface does not. The producer field is overwritten. An incremental update is appended. The xref table grows. The creation_date and modification_date diverge. None of this is visible if you only look at the rendered pages.

HTPBE?’s API reads the file’s internal structure and returns one of three verdicts:

intact — the file structure is consistent with an institutional generator and shows no post-creation modification.
modified — the structure shows post-creation modification markers. Named markers like HTPBE_DATES_DISAGREE, HTPBE_RESIDUAL_PRIOR_GENERATOR, and HTPBE_REEXPORTED_THROUGH_OFFICE_SUITE describe what was found.
inconclusive — the file was built with consumer software (Microsoft Word print-to-PDF, Mac Quartz, a phone screenshot wrapper) and structural integrity cannot be established. This is not a failure verdict. For a document that claims to be a bank statement, inconclusive is itself a signal worth weighing: many institutional bank portals do not normally emit Quartz PDFs, so an inconclusive verdict on a claimed-bank-portal document is a reason to ask for an alternative source rather than to accept it at face value.

This last point is where structural forensics complements visual AI rather than competing with it. Visual-AI systems are designed to be sensitive to documents that look manipulated; structural forensics is sensitive to documents whose origin doesn’t match the institution they claim. Used together, the two layers address both the pixel-edit attack and the from-scratch fabrication attack. Used separately, each leaves a gap the other closes.

A Worked Example: One Statement, Three Verdicts

You can run the same bank statement PDF through the live tool at htpbe.tech without an account to see the verdict structure before integrating the API. Try it with three documents: a real statement you downloaded from your own bank’s portal, the same statement after opening and re-saving in Adobe Acrobat, and a known fabricated statement from one of the generator sites. The first returns intact. The second returns modified with a specific marker set. The third returns inconclusive — not because we can’t analyse it, but because the file was never an institutional document to begin with.

For a tenant screening platform, the decision logic on the third case generally collapses to the same as on the second: escalate or request an alternative. A document that claims to be a bank statement but presents structurally as a consumer-software PDF leaves the platform without the institutional fingerprint it would need to trust the file at face value — whether the underlying content is genuine or fabricated, the appropriate action is to ask for a verifiable alternative rather than to accept it.

Integration Pattern

Most tenant screening platforms already have a document-upload pipeline that lands files in S3 or an equivalent bucket and triggers a webhook into the screening workflow. The HTPBE? integration sits at that webhook.

// applicant-document-uploaded webhook handler
import fetch from 'node-fetch';

const HTPBE_API_KEY = process.env.HTPBE_API_KEY;
const HTPBE_ENDPOINT = 'https://api.htpbe.tech/v1/analyze';

async function checkApplicantDocument({ applicantId, documentUrl, documentType }) {
  // Only run structural forensics on income-proof documents.
  // Government ID and lease docs go through their own pipelines.
  if (!['bank_statement', 'pay_stub', 'offer_letter'].includes(documentType)) {
    return { skipped: true };
  }

  const response = await fetch(HTPBE_ENDPOINT, {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${HTPBE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url: documentUrl,
      tool: 'tenant-screening',
    }),
  });

  const result = await response.json();

  return {
    applicantId,
    documentType,
    verdict: result.verdict, // intact | modified | inconclusive
    confidence: result.confidence, // certain | high | none
    markers: result.modification_markers, // array of HTPBE_* codes
    checkId: result.id,
  };
}

The downstream decision layer then consumes the verdict alongside Plaid, OCR, and credit signals. A reasonable starting policy:

intact → clear on this signal; proceed with other checks.
modified with confidence: certain → auto-flag, route to manual review or auto-decline depending on platform risk appetite.
modified with confidence: high → flag, require an additional income-verification step (Plaid connect, employer verification, or a fresh statement from a verified email domain).
inconclusive on a document that claimed to be from an institutional source → same as modified high — flag and request an alternative form of proof.

Average analysis latency is under ten seconds for documents under 10 MB. The endpoint accepts public URLs, so the integration does not need to stream file bytes through your application server.

False-Positive Calibration: Don’t Auto-Decline on Edge Cases

A subset of legitimate applicants will produce documents that trigger modification markers without committing fraud. The most common cases:

Mobile bank apps that export through phone-OS PDF pipelines. Some neobanks export statements via the phone’s native print-to-PDF path rather than a server-side generator. These return inconclusive because the producer is iOS Quartz or Android print-driver, not the bank’s server. The user did nothing wrong, but the file structurally cannot be distinguished from a Quartz-built fabrication.
Bank statements re-exported by accounting software. Some applicants pull statements into QuickBooks or Wave for personal bookkeeping and then export the QuickBooks-stamped PDF rather than the original. The producer reads as accounting software, not the bank.
Documents that passed through a corporate document-management system. Common for offer letters from large employers that route everything through DocuSign or a DMS that re-emits the PDF. The original signing service replaces the producer string.

The pattern is the same across all three: structural markers fire, but the underlying content is not fraudulent. Policy implication: do not auto-decline on inconclusive or on modified with confidence: high alone. Route those cases into a fallback path — Plaid connect, a re-upload requested directly from the bank’s portal email, or human review — rather than into the same bucket as a Producer-string-overwritten fabrication.

Platforms that get this calibration wrong tend to push their false-decline rate up and attract fair-housing scrutiny they didn’t need. Using structural forensics as a trigger for escalation rather than as a verdict on its own helps keep false-decline risk lower and keeps the policy defensible.

Volume Economics

Pricing matters at tenant-screening scale because the cost has to fit inside the per-applicant fee the platform charges the landlord. At HTPBE?’s published pricing, per-check cost on the Pro tier comes out to roughly $0.33 per analysis at full utilisation; volume-tiered Enterprise pricing brings that further down for platforms processing five-digit document counts per month.

A platform charging the landlord $45 per applicant, running an average of 2.4 income documents per applicant, would add roughly $0.80 in document-forensics cost per applicant — under 2% of the customer fee — for a signal that covers a category of indemnity exposure most existing layers in the pipeline do not directly address.

Compare that to the build-internal alternative: a corpus of bank statements at meaningful scale, a vision-model training pipeline, and ongoing maintenance as bank templates change. Visual-AI vendors in this space have publicly disclosed raises in the eight- and nine-figure range, which is one indicator of the underlying build cost. Structural PDF forensics is a different problem class — bytes, not pixels — and the underlying engine doesn’t need retraining each time a bank ships a new statement template.

What Structural Forensics Does Not Catch

Honest scope is the same in tenant screening as in any other vertical.

Born-synthetic forgeries. A document generated end-to-end by PDFKit, ReportLab, Puppeteer, or one of the bank-statement-generator sites that emits its output through a programmatic PDF library can produce a file that is structurally clean. The byte structure is consistent because the file was assembled from scratch by a single tool — just not the tool the document claims. Many of these still show as inconclusive because their producer string doesn’t match any institutional generator, but a sufficiently sophisticated forger who reads the public marker catalog could spoof the producer string. Plaid or employer-verification is the ground-truth answer here, not structural PDF forensics.

Authentic documents with false content. A real Chase statement from a real Chase account belonging to someone else does not trigger any structural marker. Identity verification is the layer that catches this.

Statements pulled from a third party with the applicant’s consent. Documents pulled through Plaid Asset Report and re-emitted as PDF carry Plaid’s producer string. The verdict is intact or inconclusive depending on the export path, and either is consistent with a legitimate Plaid pull.

The frame to give your trust-and-safety team: structural forensics catches the careless and middle-tier fraudster (a large share of practical document-fraud cases), Plaid catches the careful fraudster who consents to a real-account check (and the careful fraudster who doesn’t consent is the signal in itself), and identity verification catches the synthetic-identity case. Three layers, three different failure modes.

Who This Article Is For

Product or trust-and-safety leaders at tenant screening platforms processing more than 5,000 applicant documents a month who are looking to either add a document-fraud layer for the first time or augment an existing OCR/visual-AI pipeline with a structural signal. If you are at a platform that processes fewer than 5,000 documents a month, the self-serve Starter or Growth tier covers your volume without a sales conversation.

If you are a single landlord or property manager rather than a SaaS, the companion piece on rental application bank statement fraud covers the same forensic signals from the operator’s seat instead of the platform’s.

FAQ

How is this different from visual-AI tools like Snappt?

Visual-AI tools in this category are generally described as image-based systems that flag pixel-level inconsistencies introduced by editing operations. HTPBE? is a structural-bytes layer that reads the PDF’s internal file structure and detects post-creation modification through the file’s xref table, producer chain, and metadata layers. The two approaches address overlapping but distinct fraud patterns, and stacks that need broad coverage typically run both. Specific vendor capabilities and roadmaps differ — only the vendor can speak authoritatively to their own product.

Can I test it on real documents without a contract?

Yes. The web tool at htpbe.tech accepts any PDF up to 10 MB without signup. The free API tier lets you run integration tests against the production endpoint with a test key before committing. Live keys with paid quota are self-serve from $15/month.

What about pay stubs and offer letters?

The same structural markers apply. Pay stubs from major payroll processors (ADP, Gusto, Paychex, Workday) have a recognisable institutional structure; a pay stub that claims ADP origin but presents as a Word document is a strong signal. Offer letters are softer because the legitimate variation in HR document tooling is wider, but post-creation editing markers still fire when an offer letter is opened and re-saved.

How does it handle the legitimate-but-flagged cases?

Structural forensics returns a verdict and named markers, not a decision. The platform’s decision layer is where the policy lives. For cases where the marker is consistent with a legitimate edge case (mobile-app export, accounting-software re-export, corporate DMS), the recommended pattern is to escalate to an alternative income-verification path — Plaid connect or a re-upload from the bank’s direct email — rather than to auto-decline. This helps keep false-decline risk lower and keeps the policy defensible.