55 Forensic Checks

How HTPBE? Detects Tampered Documents

55 forensic checks run on every API call — catching structural manipulation that visual review and KYC platforms miss. Under 3 seconds per document, no original file required.

Why structural analysis

Visual tools check appearance. HTPBE? checks what the software wrote.

A growing class of document fraud detection tools operates on what a PDF looks like — scanning for pixel-level inconsistencies, lighting artifacts, or unusual noise patterns. For physical documents that have been photographed, this has genuine value. For native digital PDFs — the kind a bank, payroll platform, or accounting system generates directly — it addresses the wrong question.

Sophisticated document fraud does not alter the image of a number. It replaces the underlying data while leaving the visual presentation intact. The resulting document passes every pixel-level check cleanly, because visually nothing changed.

HTPBE? examines the internal revision history, the object structure, the font assembly records, the signature coverage maps — layers that rendering engines never expose to the eye. A document can look exactly right and be structurally compromised. We find the second kind.

55
Forensic checks per document
< 3 sec
Median analysis time
8
Independent detection layers
No original
Required for comparison

Detection coverage

What the 55 checks cover

Every document runs all checks independently. Multiple findings can fire simultaneously — each reported separately with its own confidence level.

8

Metadata & Timestamps

  • HTPBE_DATES_DISAGREE Modification date postdating the declared creation date
  • HTPBE_TIMESTAMP_LAYERS_DISAGREE Two internal timestamp records that contradict each other
  • HTPBE_IDENTITY_LAYERS_DISAGREE Author or title fields that disagree between internal records
  • HTPBE_IMPOSSIBLE_TIMESTAMP Timestamp values that are physically impossible
  • HTPBE_INVALID_TIMEZONE Timezone offset that doesn’t correspond to any real location
  • HTPBE_ANNOTATION_PREDATES_CREATION Annotation timestamps that predate the document’s own creation date
  • HTPBE_UNLOCK_TOOL_RESIDUE Document title showing residue from a known editing or unlock tool
  • HTPBE_XMP_DATES_DESYNCED Timestamps in the embedded metadata layer that disagree with each other — indicates the metadata was rewritten independently of the document content
2

Digital Signatures

  • HTPBE_POST_SIGNATURE_EDIT Content changed after the document was digitally signed
  • HTPBE_SIGNATURE_REMOVED Evidence that a digital signature was stripped after signing
3

Incremental Update History

  • HTPBE_MULTIPLE_REVISION_LAYERS Multiple revision layers — the document was modified after creation
  • HTPBE_METADATA_ONLY_REVISION Edit history limited to metadata fields, characteristic of automated timestamp tampering
  • HTPBE_FLATTENED_EDIT_HISTORY Editing history collapsed into a single revision to mask multiple prior modification rounds inside what claims to be a freshly generated file
13

Generator Fingerprinting

  • HTPBE_TOOL_VS_STRUCTURE_MISMATCH Declared authoring software doesn’t match actual file construction
  • HTPBE_FONT_VS_TOOL_MISMATCH Font handling inconsistent with the declared authoring tool
  • HTPBE_RESIDUAL_PRIOR_GENERATOR Residual generator identity left behind by a tool that tried to claim a different origin
  • HTPBE_EDITING_TOOL_FINGERPRINT Fingerprints of a known PDF editing tool
  • HTPBE_ONLINE_EDITOR_ORIGIN Document carries the identity of an online PDF editing or conversion service
  • HTPBE_SELECTIVE_IDENTITY_EDIT Generator identity selectively edited rather than fully replaced
  • HTPBE_IDENTITY_OVERWRITTEN Entire generator identity overwritten to hide origin
  • HTPBE_UNKNOWN_IDENTITY_CLAIM Identity claims that match no legitimate authoring tool
  • HTPBE_SCRIPTING_ON_REEMITTED Active scripting added on top of a previously re-emitted document
  • HTPBE_LAYERED_IDENTITY_DISAGREEMENT Different generator identities across different parts of the file
  • HTPBE_IMPOSSIBLE_TOOL_PIPELINE Tool combinations that don’t occur in any legitimate publishing pipeline
  • HTPBE_IDENTITY_BLANKED Identity fields deliberately blanked by post-processing
  • HTPBE_DESIGN_TEMPLATE_ASSEMBLY Document constructed from a design-tool template
7

Document Assembly

  • HTPBE_PAGES_FROM_MULTIPLE_SOURCES Pages assembled from independently rendered sources
  • HTPBE_RESIDUAL_DOCUMENT_STRUCTURE Residual identity from a different document embedded inside this one
  • HTPBE_MIXED_TOOLING_CLASSES Pages produced by inconsistent classes of tooling within the same document
  • HTPBE_SCAN_IN_DIGITAL_DOC Scanned image inserted programmatically into a digital document
  • HTPBE_REEXPORTED_THROUGH_OFFICE_SUITE Pages re-exported through an office-suite editor after creation
  • HTPBE_MIXED_PAGE_DIMENSIONS Pages of physically different dimensions within the same document
  • HTPBE_TEMPLATE_PATTERN_BREAK Template coverage that breaks pattern between pages
13

Content Stream Analysis

  • HTPBE_CONTENT_STREAM_EDITING_MARKERS Editing markers inside the content stream
  • HTPBE_DRAWING_OPS_INCONSISTENT Drawing operations inconsistent with the document’s claimed authoring tool
  • HTPBE_TEXT_AS_VECTOR_OUTLINES Text converted to vector shapes to defeat downstream extraction
  • HTPBE_PRINTED_OUT_OF_PDF_READER Document rebuilt by a consumer print-driver utility, erasing original authoring history
  • HTPBE_TEXT_OVERLAY_ON_SCAN Text layer floating above scanned images — values overlaid on a scan
  • HTPBE_INCOMPLETE_REDACTION Redaction markings that don’t actually conceal the underlying content
  • HTPBE_INVISIBLE_DUPLICATE_TEXT Invisible duplicate text shadowing visible values on the page
  • HTPBE_CHARACTER_OVERLAY_EDIT Targeted character-level overlays added by a desktop PDF editor
  • HTPBE_GLYPH_LEVEL_EDIT Targeted glyph-level edits where individual characters were replaced after authoring
  • HTPBE_MIXED_FONT_EMBEDDING A single typeface embedded in two incompatible ways on the same page
  • HTPBE_PARTIAL_FONT_REPLACEMENT Editor re-save patterns where original font embedding was partially replaced
  • HTPBE_BURNED_IN_ANNOTATIONS Annotations from a markup tool burned into the page after original creation
  • HTPBE_WIDGET_APPEARANCE_MISMATCH Form field value that disagrees with the value visible inside the widget
5

Image Forensics

  • HTPBE_IMAGE_EDITED_AND_RESAVED Embedded image showing signs of having been edited and re-saved
  • HTPBE_COLLAPSED_TO_RASTER Document content collapsed to image form to discard the editing history
  • HTPBE_PHYSICAL_SCAN_GEOMETRY Image geometry indicating the document is a scan of physical paper
  • HTPBE_FAKE_SCANNER_ORIGIN Documents that claim scanner origin without matching genuine scanner output
  • HTPBE_RENDERED_PSEUDO_SCAN Documents presented as a captured image whose pixel content shows machine-rendered uniformity inconsistent with optical capture
4

Structural Integrity

  • HTPBE_NAVIGATION_TABLE_MISMATCH Internal navigation tables that don’t match the rest of the file
  • HTPBE_DECLARED_SIZE_MISMATCH Declared file size that doesn’t match actual content
  • HTPBE_IDENTIFIER_MISMATCH Document identifier inconsistent across the file’s internal records
  • HTPBE_TRAILING_BYTES_AFTER_EOF Extra content appended past the document’s normal end marker

How it works

Three steps, under 3 seconds

1

Send a PDF URL

POST a publicly accessible URL to /api/v1/analyze. No file upload, no size limit for the API. You get back a check ID immediately.

2

8 forensic dimensions analyzed

All 55 checks run in parallel across metadata, structure, digital signatures, generator fingerprinting, document assembly, content streams, image forensics, and structural integrity.

3

Verdict + named markers

GET /api/v1/result/{id} for a structured verdict — intact, modified, or inconclusive — with every triggered finding named individually and confidence rated.

Example API response

What you get back

A bank statement with a removed digital signature and mismatched timestamps returns this. Each marker is a named, actionable forensic finding — not a score.

GET /api/v1/result/{id}
{
  "id": "9f3a2c1d-8b47-4e6f-a012-3d5e7f890123",
  "status": "modified",
  "modification_confidence": "certain",
  "modification_markers": [
    "HTPBE_SIGNATURE_REMOVED",
    "HTPBE_DATES_DISAGREE"
  ],
  "origin": {
    "type": "institutional",
    "software": null
  },
  "creator": "Adobe Acrobat Pro DC",
  "has_incremental_updates": true,
  "update_chain_length": 3,
  "xref_count": 4,
  "page_count": 12
}

Full schema and all marker descriptions in the API reference →

Verdicts

Three possible outcomes

intact

No modification detected

No structural evidence of post-creation changes. The file structure matches what the generating software would produce without intervention.

Intact confirms the file was not modified. It does not guarantee the content is truthful — a document fabricated from scratch with false data returns intact.

inconclusive

Cannot confirm institutional origin

The document was created in consumer-grade software — Microsoft Word, LibreOffice, a free online converter — that does not leave the institutional markers genuine issued documents always contain.

For fraud teams: real banks and payroll engines generate documents programmatically. Inconclusive on a bank statement or payslip means the file was assembled outside an institutional system.

modified

Forensic evidence of tampering

One or more structural markers confirm post-creation modification. Confidence is either certain (signatures or date contradictions) or high for all other markers.

Secure your workflow

Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.