Comparison

Why PDF Metadata Tools Miss Most Document Fraud

Built for fraud ops at lending, insurance & compliance teams

ExifTool, PDF metadata viewers, and generic document inspection tools show you what metadata says. HTPBE? cross-validates metadata against file structure, font patterns, and digital signatures — because metadata is exactly what fraudsters manipulate first.

~3 sec
per document
59 checks
forensic layers
From $15
per month
1,500+
docs / month on Growth

The core problem

Metadata is the first thing fraudsters clean

When a fraudster modifies a bank statement, the first thing they do is clean the metadata. Any tool that only reads metadata fields will show a clean result after this step. The real evidence is in the structural layers: cross-reference tables, object streams, font subsets, incremental update history.

These layers cannot be erased without completely regenerating the file — which itself leaves a detectable trace. Metadata viewers like ExifTool show you raw field values; they don’t cross-validate those values against the binary structure underneath.

What metadata tools cannot see

  • Incremental update revisions in the cross-reference table
  • Font subset divergence across pages from different sources
  • Digital signature invalidation after post-signing edits
  • Generator fingerprint mismatches in the object structure
  • Whether metadata values were altered after creation

What this looks like

Metadata tools vs HTPBE?, side by side

Three real fraud mechanics we catch at the structural PDF layer.

01

What they check: metadata fields vs metadata + 6 structural layers

ExifTool and similar viewers parse the producer, dates, author, and XMP metadata. HTPBE? parses the same metadata and cross-references it against six additional structural layers — cross-reference chain, object streams, fonts, signatures, image streams, and incremental update history.

02

Fooled by: clearing metadata fields vs structural traces remain

Metadata wipe takes seconds in any consumer PDF editor and defeats metadata-only tools. HTPBE? keeps detecting because the structural traces of editing — xref revisions, font fingerprints, signature mismatches — remain even after metadata is cleared.

03

Detects edited metadata: shows values vs cross-validates them

Metadata viewers display whatever the field says, even if it’s been altered. HTPBE? cross-validates metadata against the internal binary structure — if the declared producer doesn’t match the generator fingerprint embedded in the object structure, that contradiction surfaces.

04

Result format: raw field dump vs structured verdict

ExifTool returns a raw dump for a human to interpret. HTPBE? returns a structured verdict (INTACT / MODIFIED / INCONCLUSIVE) plus named markers, designed to drive automated routing decisions in fraud pipelines.

05

Integration: CLI tools vs REST API

Metadata viewers are CLI utilities meant for one-off manual inspection. HTPBE? is a REST API designed to drop into lending, compliance, or AP workflows — same input contract, deterministic output, no shell scripting required.

59 layers
Forensic checks per document
~3 sec
Median analysis time, end to end
From $15
Self-serve per month, no sales call

When to use each

Different jobs — pick the right tool

Both read PDF files. Only one reads what fraudsters can’t erase.

Metadata tools (ExifTool, etc.)

Quick manual spot-check

  • Useful when you already suspect something specific
  • One-off inspection by someone who knows the format
  • Raw field dump — you interpret the values yourself
  • CLI workflow, no automation surface

Reasonable starting point for a single-document investigation.

HTPBE?

Automated pipeline at scale

  • 59 forensic checks across 7 structural layers
  • Tamper-resistant — survives metadata wiping
  • Structured verdict in under 3 seconds
  • REST API drops into lending, compliance, AP workflows

Built for hundreds to thousands of documents per month.

Results in under 3 seconds30 to 1,500+ documents/monthFrom $15/mo

What HTPBE? checks

Detection capabilities

Deterministic structural signals. No probabilistic scores, no model training.

Incremental update traces

When a PDF is reopened and edited, changes are appended as a new revision layer rather than rewriting the file. This trail lives in the cross-reference table structure — not in any metadata field. Metadata tools cannot see it.

Font subset divergence

Pages assembled from different source PDFs carry distinct font subset namespaces. These prefixes are assigned at PDF generation time and reveal when content originated in a different document — invisible in any metadata view.

Signature invalidation

A digital signature cryptographically covers the file content at the moment of signing. If the content changes afterward, the signature no longer validates — but the metadata may still show a signature field. Only structural analysis reveals the mismatch.

Generator fingerprint mismatch

The PDF binary contains a producer fingerprint embedded in its object structure, independent of the declared metadata. When these contradict each other — a known generator signature paired with mismatched metadata — it indicates the metadata was altered after creation.

Image stream tampering

Replaced or pasted images leave compression artefacts and stream-level traces that differ from authentic embedded content. Metadata tools never read image streams; HTPBE? inspects them as part of the structural pass.

Cross-reference chain integrity

A clean PDF has a single, contiguous xref chain. Edited PDFs accumulate appended revisions. The chain length and topology are direct evidence of editing history — structural data, not metadata.

Share with engineering

Wire this into your intake pipeline in under a day

Two API calls — one POST to submit the PDF, one GET to retrieve the verdict. Forward this page to your engineering team; the full API reference, quotas, and copy-paste examples in cURL, JavaScript, Python, PHP, Go, and Ruby are one click away.

Pricing

Self-serve plans, no sales call

All plans include the same forensic checks. Pick the quota that matches your monthly document volume.

manual

Starter

$15/mo

30 checks/mo

Manual spot-checks and integration testing

most common

Growth

$149/mo

350 checks/mo

Active document processing pipelines

high volume

Pro

$499/mo

1,500 checks/mo

High-volume automation and API integrations

Enterprise (unlimited, on-premise available) see full pricing

API key on signup. Free test environment on every plan. No card required.

Customer Stories

Teams that stopped document fraud

Compliance, finance, and risk teams use HTPBE? to catch manipulated PDFs before they become costly mistakes.

Caught an invoice where the total had been changed by less than a thousand dollars. Without this I would have approved it without a second look.

Sarah M.

AP Manager

United States

We had three applicants in the same week with bank statements that looked completely fine. Two of them were flagged as modified. You simply cannot see this by reading the document — it is in the file structure.

Lars V.

Risk Analyst, Online Lending

Netherlands

Salary slips were coming with altered figures. We identified two problematic files before the placement was finalised.

Priya K.

HR Operations Lead

India

Since we started checking documents this way, we stopped two applications early in the process that would have been very difficult to reverse later.

Julien R.

Fraud Analyst, Fintech

France

Some applicants were sending PDFs that looked authentic but had been edited in ways not visible to the eye. We now ask for checked originals when something is flagged. Already saved us from a few bad decisions.

Marta S.

Compliance Coordinator

Spain

One invoice was caught because there was a mismatch between the document dates and structure. That particular case would have cost us significantly.

Tariq A.

Finance Manager

United Arab Emirates

FAQ

Frequently asked questions

Why isn’t reading metadata enough?

Metadata is the easiest layer to clean. Any consumer PDF editor can blank out the producer, author, and date fields in seconds. After a metadata wipe, ExifTool and similar viewers report a clean file. The actual evidence of editing — cross-reference table revisions, font subset divergence, signature invalidation — lives in the binary structure and survives metadata cleaning.

Does HTPBE? replace ExifTool?

For one-off manual inspection by someone who already knows what they’re looking for, ExifTool is fine. For automated detection at pipeline scale — where you need a tamper-resistant, structured verdict you can route on — HTPBE? is the right tool. They solve different problems.

Can HTPBE? detect edits even after metadata is wiped?

Yes. That’s the point of structural analysis. HTPBE? runs 59 checks across 7 layers including the cross-reference chain, object streams, fonts, digital signatures, image streams, and incremental update history. Metadata is one input among many — wiping it removes one signal but leaves the others intact.

What format are HTPBE? results in?

A structured JSON response with a verdict (INTACT, MODIFIED, or INCONCLUSIVE), confidence level, named modification markers, and the underlying metadata fields. Designed to drive automated routing decisions, not for a human to read raw field dumps.

Secure your workflow

Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.