How HTPBE? Detects Tampered Documents
55 forensic checks run on every API call — catching structural manipulation that visual review and KYC platforms miss. Under 3 seconds per document, no original file required.
Why structural analysis
Visual tools check appearance. HTPBE? checks what the software wrote.
A growing class of document fraud detection tools operates on what a PDF looks like — scanning for pixel-level inconsistencies, lighting artifacts, or unusual noise patterns. For physical documents that have been photographed, this has genuine value. For native digital PDFs — the kind a bank, payroll platform, or accounting system generates directly — it addresses the wrong question.
Sophisticated document fraud does not alter the image of a number. It replaces the underlying data while leaving the visual presentation intact. The resulting document passes every pixel-level check cleanly, because visually nothing changed.
HTPBE? examines the internal revision history, the object structure, the font assembly records, the signature coverage maps — layers that rendering engines never expose to the eye. A document can look exactly right and be structurally compromised. We find the second kind.
Detection coverage
What the 55 checks cover
Every document runs all checks independently. Multiple findings can fire simultaneously — each reported separately with its own confidence level.
Metadata & Timestamps
HTPBE_DATES_DISAGREE— Modification date postdating the declared creation dateHTPBE_TIMESTAMP_LAYERS_DISAGREE— Two internal timestamp records that contradict each otherHTPBE_IDENTITY_LAYERS_DISAGREE— Author or title fields that disagree between internal recordsHTPBE_IMPOSSIBLE_TIMESTAMP— Timestamp values that are physically impossibleHTPBE_INVALID_TIMEZONE— Timezone offset that doesn’t correspond to any real locationHTPBE_ANNOTATION_PREDATES_CREATION— Annotation timestamps that predate the document’s own creation dateHTPBE_UNLOCK_TOOL_RESIDUE— Document title showing residue from a known editing or unlock toolHTPBE_XMP_DATES_DESYNCED— Timestamps in the embedded metadata layer that disagree with each other — indicates the metadata was rewritten independently of the document content
Digital Signatures
HTPBE_POST_SIGNATURE_EDIT— Content changed after the document was digitally signedHTPBE_SIGNATURE_REMOVED— Evidence that a digital signature was stripped after signing
Incremental Update History
HTPBE_MULTIPLE_REVISION_LAYERS— Multiple revision layers — the document was modified after creationHTPBE_METADATA_ONLY_REVISION— Edit history limited to metadata fields, characteristic of automated timestamp tamperingHTPBE_FLATTENED_EDIT_HISTORY— Editing history collapsed into a single revision to mask multiple prior modification rounds inside what claims to be a freshly generated file
Generator Fingerprinting
HTPBE_TOOL_VS_STRUCTURE_MISMATCH— Declared authoring software doesn’t match actual file constructionHTPBE_FONT_VS_TOOL_MISMATCH— Font handling inconsistent with the declared authoring toolHTPBE_RESIDUAL_PRIOR_GENERATOR— Residual generator identity left behind by a tool that tried to claim a different originHTPBE_EDITING_TOOL_FINGERPRINT— Fingerprints of a known PDF editing toolHTPBE_ONLINE_EDITOR_ORIGIN— Document carries the identity of an online PDF editing or conversion serviceHTPBE_SELECTIVE_IDENTITY_EDIT— Generator identity selectively edited rather than fully replacedHTPBE_IDENTITY_OVERWRITTEN— Entire generator identity overwritten to hide originHTPBE_UNKNOWN_IDENTITY_CLAIM— Identity claims that match no legitimate authoring toolHTPBE_SCRIPTING_ON_REEMITTED— Active scripting added on top of a previously re-emitted documentHTPBE_LAYERED_IDENTITY_DISAGREEMENT— Different generator identities across different parts of the fileHTPBE_IMPOSSIBLE_TOOL_PIPELINE— Tool combinations that don’t occur in any legitimate publishing pipelineHTPBE_IDENTITY_BLANKED— Identity fields deliberately blanked by post-processingHTPBE_DESIGN_TEMPLATE_ASSEMBLY— Document constructed from a design-tool template
Document Assembly
HTPBE_PAGES_FROM_MULTIPLE_SOURCES— Pages assembled from independently rendered sourcesHTPBE_RESIDUAL_DOCUMENT_STRUCTURE— Residual identity from a different document embedded inside this oneHTPBE_MIXED_TOOLING_CLASSES— Pages produced by inconsistent classes of tooling within the same documentHTPBE_SCAN_IN_DIGITAL_DOC— Scanned image inserted programmatically into a digital documentHTPBE_REEXPORTED_THROUGH_OFFICE_SUITE— Pages re-exported through an office-suite editor after creationHTPBE_MIXED_PAGE_DIMENSIONS— Pages of physically different dimensions within the same documentHTPBE_TEMPLATE_PATTERN_BREAK— Template coverage that breaks pattern between pages
Content Stream Analysis
HTPBE_CONTENT_STREAM_EDITING_MARKERS— Editing markers inside the content streamHTPBE_DRAWING_OPS_INCONSISTENT— Drawing operations inconsistent with the document’s claimed authoring toolHTPBE_TEXT_AS_VECTOR_OUTLINES— Text converted to vector shapes to defeat downstream extractionHTPBE_PRINTED_OUT_OF_PDF_READER— Document rebuilt by a consumer print-driver utility, erasing original authoring historyHTPBE_TEXT_OVERLAY_ON_SCAN— Text layer floating above scanned images — values overlaid on a scanHTPBE_INCOMPLETE_REDACTION— Redaction markings that don’t actually conceal the underlying contentHTPBE_INVISIBLE_DUPLICATE_TEXT— Invisible duplicate text shadowing visible values on the pageHTPBE_CHARACTER_OVERLAY_EDIT— Targeted character-level overlays added by a desktop PDF editorHTPBE_GLYPH_LEVEL_EDIT— Targeted glyph-level edits where individual characters were replaced after authoringHTPBE_MIXED_FONT_EMBEDDING— A single typeface embedded in two incompatible ways on the same pageHTPBE_PARTIAL_FONT_REPLACEMENT— Editor re-save patterns where original font embedding was partially replacedHTPBE_BURNED_IN_ANNOTATIONS— Annotations from a markup tool burned into the page after original creationHTPBE_WIDGET_APPEARANCE_MISMATCH— Form field value that disagrees with the value visible inside the widget
Image Forensics
HTPBE_IMAGE_EDITED_AND_RESAVED— Embedded image showing signs of having been edited and re-savedHTPBE_COLLAPSED_TO_RASTER— Document content collapsed to image form to discard the editing historyHTPBE_PHYSICAL_SCAN_GEOMETRY— Image geometry indicating the document is a scan of physical paperHTPBE_FAKE_SCANNER_ORIGIN— Documents that claim scanner origin without matching genuine scanner outputHTPBE_RENDERED_PSEUDO_SCAN— Documents presented as a captured image whose pixel content shows machine-rendered uniformity inconsistent with optical capture
Structural Integrity
HTPBE_NAVIGATION_TABLE_MISMATCH— Internal navigation tables that don’t match the rest of the fileHTPBE_DECLARED_SIZE_MISMATCH— Declared file size that doesn’t match actual contentHTPBE_IDENTIFIER_MISMATCH— Document identifier inconsistent across the file’s internal recordsHTPBE_TRAILING_BYTES_AFTER_EOF— Extra content appended past the document’s normal end marker
How it works
Three steps, under 3 seconds
Send a PDF URL
POST a publicly accessible URL to /api/v1/analyze. No file upload, no size limit for the API. You get back a check ID immediately.
8 forensic dimensions analyzed
All 55 checks run in parallel across metadata, structure, digital signatures, generator fingerprinting, document assembly, content streams, image forensics, and structural integrity.
Verdict + named markers
GET /api/v1/result/{id} for a structured verdict — intact, modified, or inconclusive — with every triggered finding named individually and confidence rated.
Example API response
What you get back
A bank statement with a removed digital signature and mismatched timestamps returns this. Each marker is a named, actionable forensic finding — not a score.
{
"id": "9f3a2c1d-8b47-4e6f-a012-3d5e7f890123",
"status": "modified",
"modification_confidence": "certain",
"modification_markers": [
"HTPBE_SIGNATURE_REMOVED",
"HTPBE_DATES_DISAGREE"
],
"origin": {
"type": "institutional",
"software": null
},
"creator": "Adobe Acrobat Pro DC",
"has_incremental_updates": true,
"update_chain_length": 3,
"xref_count": 4,
"page_count": 12
}Full schema and all marker descriptions in the API reference →
Verdicts
Three possible outcomes
intactNo modification detected
No structural evidence of post-creation changes. The file structure matches what the generating software would produce without intervention.
Intact confirms the file was not modified. It does not guarantee the content is truthful — a document fabricated from scratch with false data returns intact.
inconclusiveCannot confirm institutional origin
The document was created in consumer-grade software — Microsoft Word, LibreOffice, a free online converter — that does not leave the institutional markers genuine issued documents always contain.
For fraud teams: real banks and payroll engines generate documents programmatically. Inconclusive on a bank statement or payslip means the file was assembled outside an institutional system.
modifiedForensic evidence of tampering
One or more structural markers confirm post-creation modification. Confidence is either certain (signatures or date contradictions) or high for all other markers.
Secure your workflow
Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.