Ocrolus alternative

OCR extracts the income figure from a fraudulent bank statement perfectly — it cannot detect that the file was fabricated in Word

Built for fraud ops at lending, insurance & compliance teams

OCR-based document analysis reads what is printed on the page. A fabricated bank statement created in Microsoft Word reads correctly — the numbers are right, the layout looks real, the extracted data passes cash-flow thresholds. What OCR cannot read is the file's structural layer: the producer field showing Word instead of Chase Online Banking, the single xref table with no incremental history, the modification timestamp gap. HTPBE? reads that layer. Pair it with your existing extraction stack.

~3 sec
per document
59 checks
forensic layers
From $15
per month
1,500+
docs / month on Growth
Scope

HTPBE? analyzes the structural layer of the PDF file only — producer, xref, metadata, image streams, signature chain, balance arithmetic. We don't extract data, we don't read text content with OCR, we don't classify transactions or build cash-flow profiles. Ocrolus has those layers and a customer base; HTPBE? is positioned for teams that want structural fraud detection as a focused primitive, separate from extraction and analytics.

What this looks like

How structural fraud survives OCR-based document analysis

Three real fraud mechanics we catch at the structural PDF layer.

01

Bank statement fabricated in Word — OCR reads it correctly

Applicant creates a bank statement layout in Microsoft Word using the bank's logo from the web, types in three months of fictitious deposits totalling $6,400/month, exports to PDF. OCR extracts the amounts correctly. Cash-flow analytics shows regular income. The producer field showing Microsoft Word — not the bank's issuance system — is invisible to every layer that reads the document as text.

02

Pay stub edited to raise gross pay — OCR reports the inflated number

Applicant downloads a real Gusto pay stub showing $3,200/month, opens it in an editor, raises the gross pay to $5,800, saves. OCR extracts $5,800 and cash-flow analysis accepts it. The xref chain shows a second cross-reference table appended after the original Gusto export — structural evidence of the edit that OCR has no mechanism to detect.

03

W-2 with wages changed after IRS e-file — OCR cannot see the timestamp gap

A real W-2 from TurboTax shows $42,000 in Box 1. Applicant opens it months later, changes to $68,000, saves. OCR extracts the new figure. The modification date is 4 months after the creation date on a document that should be a single-session export — a structural signal only forensic analysis reads.

How HTPBE? is positioned

Structural-only
no OCR, no transaction classification — focused primitive
~3 sec
per PDF analyzed via API
$15/mo
starter plan, public pricing on /pricing

Why OCR-based document analysis has a structural blind spot

Every OCR platform in the market reads what is in the document. None of them read whether the document itself is real.

The structural layer — producer signature, xref chain, modification history — exists in the binary file, not in the text.

OCR platforms (AWS Textract, Google Document AI, Azure Form Recognizer, Ocrolus) extract text and numbers from the document as rendered. They have no mechanism to inspect the binary file structure: the producer field that names the software that created the PDF, the xref chain that records every edit session, the modification timestamp that shows when the file was last saved. A bank statement fabricated in Word and a genuine Chase online export produce identical OCR output — but the structural layer is completely different. HTPBE? reads that layer as a standalone API call that sits alongside whatever extraction stack you already use.

Results in under 3 seconds30 to 1,500+ documents/monthFrom $15/mo

Document types

PDFs we analyze structurally for lenders and mortgage ops

Every type listed below is analyzed at the structural file layer — not the rendered image.

Bank statement PDFPay stub PDFW-2 / 1099 PDF (US)Tax return PDFAsset / gift letter PDFEmployment verification letter PDFClosing disclosure PDF

What HTPBE? checks

Detection capabilities

Deterministic structural signals. No probabilistic scores, no model training.

Producer signature analysis

Authentic bank statements come from banking systems, pay stubs from payroll engines, tax forms from accounting/tax software. When the producer field shows a desktop tool (Microsoft Word, Excel, LibreOffice) or a generator-tool fingerprint (Chrome Headless, wkhtmltopdf), HTPBE? flags accordingly — no OCR needed to make this call.

Incremental update detection (xref chain)

Every edit to a PDF leaves a structural trace in the xref chain. HTPBE? counts cross-reference tables and flags incremental updates — the structural fingerprint of post-issuance editing, invisible to OCR-based extraction.

Balance arithmetic check

Running balance is validated row-by-row across bank statements (previous balance + transaction = new balance). Edited transactions break the chain unless every dependent balance was also adjusted. HTPBE? reads the structural data without OCR.

Digital signature chain validation

Tax forms, employer letters, and many institutional PDFs carry digital signature chains. HTPBE? validates the signature chain and flags invalidated or removed signatures — orthogonal to whatever OCR sees in the text.

Image-stream artefact detection

Lifted-and-pasted logos, signatures, and headers leave compression artefacts that differ from authentic embedded content. HTPBE? reads the image-stream metadata directly — exposing paste operations OCR cannot see.

Cross-document fingerprint analysis

When multiple "different" employer letters or bank statements share font subset prefixes, image hashes, or producer signatures across an applicant pool, HTPBE? surfaces the shared fingerprints — useful for catching synthetic-identity rings.

Share with engineering

Wire this into your intake pipeline in under a day

Two API calls — one POST to submit the PDF, one GET to retrieve the verdict. Forward this page to your engineering team; the full API reference, quotas, and copy-paste examples in cURL, JavaScript, Python, PHP, Go, and Ruby are one click away.

Pricing

Self-serve plans, no sales call

All plans include the same forensic checks. Pick the quota that matches your monthly document volume.

manual

Starter

$15/mo

30 checks/mo

Manual spot-checks and integration testing

most common

Growth

$149/mo

350 checks/mo

Active document processing pipelines

high volume

Pro

$499/mo

1,500 checks/mo

High-volume automation and API integrations

Enterprise (unlimited, on-premise available) see full pricing

API key on signup. Free test environment on every plan. No card required.

Customer Stories

Teams that stopped document fraud

Compliance, finance, and risk teams use HTPBE? to catch manipulated PDFs before they become costly mistakes.

Caught an invoice where the total had been changed by less than a thousand dollars. Without this I would have approved it without a second look.

Sarah M.

AP Manager

United States

We had three applicants in the same week with bank statements that looked completely fine. Two of them were flagged as modified. You simply cannot see this by reading the document — it is in the file structure.

Lars V.

Risk Analyst, Online Lending

Netherlands

Salary slips were coming with altered figures. We identified two problematic files before the placement was finalised.

Priya K.

HR Operations Lead

India

Since we started checking documents this way, we stopped two applications early in the process that would have been very difficult to reverse later.

Julien R.

Fraud Analyst, Fintech

France

Some applicants were sending PDFs that looked authentic but had been edited in ways not visible to the eye. We now ask for checked originals when something is flagged. Already saved us from a few bad decisions.

Marta S.

Compliance Coordinator

Spain

One invoice was caught because there was a mismatch between the document dates and structure. That particular case would have cost us significantly.

Tariq A.

Finance Manager

United Arab Emirates

FAQ

Frequently asked questions

Why deliberately skip OCR?

Two reasons. First, OCR is a commodity layer with mature alternatives (AWS Textract, Google Document AI, Azure Form Recognizer, Veryfi for receipts). Building another OCR engine adds nothing. Second, structural forensics is a separate discipline — the signals are in the file structure, not the text. We focus on the part most lender stacks are missing, and let teams keep their existing extraction layer.

Is HTPBE? a direct feature replacement for Ocrolus?

No. Ocrolus bundles OCR, transaction classification, cash-flow analytics, and document forensics into one platform with a managed UI. HTPBE? is the structural-forensics layer specifically. If you need the bundled scope, Ocrolus has it. If you need the structural layer as a primitive your team integrates, HTPBE? fits.

How do you compare on price?

Plans start at $15/mo (30 requests) and go to $499/mo (1,500 requests) with public pricing on /pricing. Enterprise unlimited is a contract conversation. Ocrolus pricing is on their site or via their sales team — transparent self-serve pricing is one of the deliberate differences in our positioning.

Can we evaluate before committing?

Yes. Sign up for the free tier — test API keys are included on every plan, including free. Run our test PDF fixtures through your integration without consuming live quota. No card on free tier.

Secure your workflow

Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.