logo
Ocrolus alternative

OCR extracts the income figure from a fraudulent bank statement perfectly — it cannot detect that the file was fabricated in Word

OCR-based document analysis reads what is printed on the page. A fabricated bank statement created in Microsoft Word reads correctly — the numbers are right, the layout looks real, the extracted data passes cash-flow thresholds. What OCR cannot read is the file's structural layer: the producer field showing Word instead of Chase Online Banking, the single xref table with no incremental history, the modification timestamp gap. htpbe? reads that layer. Pair it with your existing extraction stack.

~3 sec
per document
35 checks
forensic layers
From $15
per month
1,500+
docs / month on Growth
Scope

htpbe? analyzes the structural layer of the PDF file only — producer, xref, metadata, image streams, signature chain, balance arithmetic. We don't extract data, we don't read text content with OCR, we don't classify transactions or build cash-flow profiles. Ocrolus has those layers and a customer base; htpbe? is positioned for teams that want structural fraud detection as a focused primitive, separate from extraction and analytics.

How it looks

One REST call, one deterministic verdict

Upload the PDF. The API returns INTACT, MODIFIED, or INCONCLUSIVE with named markers — in about three seconds.

What this looks like

How structural fraud survives OCR-based document analysis

Three real fraud mechanics we catch at the structural PDF layer.

01

Bank statement fabricated in Word — OCR reads it correctly

Applicant creates a bank statement layout in Microsoft Word using the bank's logo from the web, types in three months of fictitious deposits totalling $6,400/month, exports to PDF. OCR extracts the amounts correctly. Cash-flow analytics shows regular income. The producer field showing Microsoft Word — not the bank's issuance system — is invisible to every layer that reads the document as text.

02

Pay stub edited to raise gross pay — OCR reports the inflated number

Applicant downloads a real Gusto pay stub showing $3,200/month, opens it in an editor, raises the gross pay to $5,800, saves. OCR extracts $5,800 and cash-flow analysis accepts it. The xref chain shows a second cross-reference table appended after the original Gusto export — structural evidence of the edit that OCR has no mechanism to detect.

03

W-2 with wages changed after IRS e-file — OCR cannot see the timestamp gap

A real W-2 from TurboTax shows $42,000 in Box 1. Applicant opens it months later, changes to $68,000, saves. OCR extracts the new figure. The modification date is 4 months after the creation date on a document that should be a single-session export — a structural signal only forensic analysis reads.

How htpbe? is positioned

Structural-only
no OCR, no transaction classification — focused primitive
~3 sec
per PDF analyzed via API
$15/mo
starter plan, public pricing on /pricing

Why OCR-based document analysis has a structural blind spot

Every OCR platform in the market reads what is in the document. None of them read whether the document itself is real.

The structural layer — producer signature, xref chain, modification history — exists in the binary file, not in the text.

OCR platforms (AWS Textract, Google Document AI, Azure Form Recognizer, Ocrolus) extract text and numbers from the document as rendered. They have no mechanism to inspect the binary file structure: the producer field that names the software that created the PDF, the xref chain that records every edit session, the modification timestamp that shows when the file was last saved. A bank statement fabricated in Word and a genuine Chase online export produce identical OCR output — but the structural layer is completely different. htpbe? reads that layer as a standalone API call that sits alongside whatever extraction stack you already use.

Results in under 3 seconds30 to 1,500+ documents/monthFrom $15/mo
How it works

Five forensic layers, one deterministic verdict

Every PDF we receive passes through the same structural pipeline — no model training, no thresholds to tune.

01

Metadata analysis

Creation and modification timestamps, producer and creator fields, XMP metadata — the first layer exposes basic tampering.

02

File structure

Xref tables, trailer chain, incremental updates. Any edit after export leaves a structural fingerprint here.

03

Digital signatures

Signature chain integrity and post-signature modifications produce deterministic markers. Certainty-level signal.

04

Content integrity

Fonts, objects, embedded content, page assembly. Multi-session edits and inserted objects are visible at this layer.

05

Verdict with markers

Deterministic output: INTACT / MODIFIED / INCONCLUSIVE, with named markers for every finding — suitable for audit trail.

Document types

PDFs we analyze structurally for lenders and mortgage ops

Every type listed below is analyzed at the structural file layer — not the rendered image.

Bank statement PDFPay stub PDFW-2 / 1099 PDF (US)Tax return PDFAsset / gift letter PDFEmployment verification letter PDFClosing disclosure PDF
What htpbe? checks

Detection capabilities

Deterministic structural signals. No probabilistic scores, no model training.

Producer signature analysis

Authentic bank statements come from banking systems, pay stubs from payroll engines, tax forms from accounting/tax software. When the producer field shows a desktop tool (Microsoft Word, Excel, LibreOffice) or a generator-tool fingerprint (Chrome Headless, wkhtmltopdf), htpbe? flags accordingly — no OCR needed to make this call.

Incremental update detection (xref chain)

Every edit to a PDF leaves a structural trace in the xref chain. htpbe? counts cross-reference tables and flags incremental updates — the structural fingerprint of post-issuance editing, invisible to OCR-based extraction.

Balance arithmetic verification

Running balance is verified row-by-row across bank statements (previous balance + transaction = new balance). Edited transactions break the chain unless every dependent balance was also adjusted. htpbe? reads the structural data without OCR.

Digital signature chain validation

Tax forms, employer letters, and many institutional PDFs carry digital signature chains. htpbe? validates the signature chain and flags invalidated or removed signatures — orthogonal to whatever OCR sees in the text.

Image-stream artefact detection

Lifted-and-pasted logos, signatures, and headers leave compression artefacts that differ from authentic embedded content. htpbe? reads the image-stream metadata directly — exposing paste operations OCR cannot see.

Cross-document fingerprint analysis

When multiple "different" employer letters or bank statements share font subset prefixes, image hashes, or producer signatures across an applicant pool, htpbe? surfaces the shared fingerprints — useful for catching synthetic-identity rings.

Integrate in minutes

An Ocrolus alternative for structural PDF fraud — no OCR layer

Buyers can skip this section — developers, the integration is two HTTP calls.

Step 1 — submit the PDF

curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer $HTPBE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-storage/applicant-bank-statement.pdf"}'

Step 2 — read the verdict (no extracted data, just integrity)

{
  "id": "o1c2r3o4-5l6u-7s8a-9z0l-a1b2c3d4e5f6",
  "status": "modified",
  "modification_confidence": "high",
  "modification_markers": [
    "Two cross-reference tables — incremental update",
    "Modification date 7 days after creation date",
    "PDF editor producer detected"
  ],
  "producer": "Adobe Acrobat Pro",
  "creator": "Chase Online Banking",
  "creation_date": 1707091200,
  "modification_date": 1707696000,
  "has_digital_signature": false,
  "xref_count": 2,
  "has_incremental_updates": true
}

Original came from Chase Online Banking — institutional source. 7 days later it was opened in Adobe Acrobat Pro and re-saved, adding a second xref. Verdict: modified at high confidence — without ever running OCR on the text. Pair this verdict with whatever extraction layer you already use.

Customer Stories

Teams that stopped document fraud

Compliance, finance, and risk teams use htpbe? to catch manipulated PDFs before they become costly mistakes.

Caught an invoice where the total had been changed by less than a thousand dollars. Without this I would have approved it without a second look.

Sarah M.

AP Manager

United States

We had three applicants in the same week with bank statements that looked completely fine. Two of them were flagged as modified. You simply cannot see this by reading the document — it is in the file structure.

Lars V.

Risk Analyst, Online Lending

Netherlands

Salary slips were coming with altered figures. We identified two problematic files before the placement was finalised.

Priya K.

HR Operations Lead

India

Since we started checking documents this way, we stopped two applications early in the process that would have been very difficult to reverse later.

Julien R.

Fraud Analyst, Fintech

France

Some applicants were sending PDFs that looked authentic but had been edited in ways not visible to the eye. We now ask for verified originals when something is flagged. Already saved us from a few bad decisions.

Marta S.

Compliance Coordinator

Spain

One invoice was caught because there was a mismatch between the document dates and structure. That particular case would have cost us significantly.

Tariq A.

Finance Manager

United Arab Emirates

FAQ

Frequently asked questions

Two reasons. First, OCR is a commodity layer with mature alternatives (AWS Textract, Google Document AI, Azure Form Recognizer, Veryfi for receipts). Building another OCR engine adds nothing. Second, structural forensics is a separate discipline — the signals are in the file structure, not the text. We focus on the part most lender stacks are missing, and let teams keep their existing extraction layer.
No. Ocrolus bundles OCR, transaction classification, cash-flow analytics, and document forensics into one platform with a managed UI. htpbe? is the structural-forensics layer specifically. If you need the bundled scope, Ocrolus has it. If you need the structural layer as a primitive your team integrates, htpbe? fits.
Plans start at $15/mo (30 requests) and go to $499/mo (1,500 requests) with public pricing on /pricing. Enterprise unlimited is a contract conversation. Ocrolus pricing is on their site or via their sales team — transparent self-serve pricing is one of the deliberate differences in our positioning.
Yes. Sign up for the free tier — test API keys are included on every plan, including free. Run our test PDF fixtures through your integration without consuming live quota. No card on free tier.

Secure your workflow

Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.