PDF Authenticity & Tamper Detection API
Detect tampered bank statements, invoices, and financial documents before they enter your pipeline. One API call, structural verdict in under 3 seconds.
No credit card needed. Test environment is free on every plan, unlimited calls. Live calls from $15/mo when you're ready.
The detection gap
What this catches that KYC doesn’t
An applicant submits a bank statement. Your KYC platform confirms the account exists and the identity matches. But the balance has been edited — $2,400 inflated to $24,000. The edit is invisible to visual review and passes every template check.
HTPBE? checks the PDF file structure, not the content. The modification shows up in the xref table, the incremental update chain, and the producer field — regardless of what the document looks like on screen.
One API call returns a structured verdict and named markers. Designed to slot into existing intake pipelines alongside Plaid, Persona, and Alloy — not instead of them.
Example response — modified verdict
{
"status": "modified",
"modification_confidence": "high",
"modification_markers": [
"HTPBE_MULTIPLE_REVISION_LAYERS",
"HTPBE_DATES_DISAGREE"
],
"has_incremental_updates": true,
"update_chain_length": 3,
"xref_count": 4,
"creator": "Microsoft Excel",
"producer": "Adobe PDF Library 15.0"
}Full response includes 20+ fields. See complete schema on GitHub →
How to integrate
From sign-up to verdict in three calls
Submit a PDF URL
POST a publicly accessible URL to /v1/analyze. S3 presigned, GCS, Azure, Dropbox, or your own CDN — any HTTPS URL up to 10 MB works. You get back a check ID immediately.
We run 59 forensic layers
Metadata, structure, digital signatures, generator fingerprinting, document assembly, content streams, image forensics, and structural integrity — all run in parallel against the binary, not the rendering.
Retrieve the verdict
GET /v1/result/{id} for a structured verdict — intact, modified, or inconclusive — with each marker named and confidence rated.
Documentation
Read it, or import it
Three endpoints with field-by-field docs on the left. Postman, Scalar, and OpenAPI bundles on the right when you’d rather skip the prose and start calling.
Endpoint reference
Field-by-field docs on github.com/htpbe/docs
Import & try
Plug into your existing API tooling
Quick reference
Three endpoints, one auth header
Base URL https://api.htpbe.tech/v1. All requests authenticated with Authorization: Bearer YOUR_API_KEY.
| Method | Endpoint | Description |
|---|---|---|
POST | /api/v1/analyze | Analyze a PDF from URL for modifications |
GET | /api/v1/result/{id} | Retrieve a previously completed check |
GET | /api/v1/checks | List all checks with filters and pagination |
Monthly quota depends on your plan — see pricing →
Quick start
First call in 30 seconds
Replace YOUR_API_KEY with the key from your dashboard. Test keys (htpbe_test_...) work the same way and return deterministic synthetic results.
# Basic usage - analyze any publicly accessible PDF
curl -X POST https://api.htpbe.tech/v1/analyze \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/document.pdf"}'Endpoint
POST /v1/analyze — submit a PDF
POST https://api.htpbe.tech/v1/analyzeRequest headers
Authorization: Bearer YOUR_API_KEY
Content-Type: application/jsonYour API key is issued automatically when you sign up. Both htpbe_live_... (production) and htpbe_test_... (testing) keys are accepted.
Request body
{
"url": "https://example.com/documents/contract.pdf",
"original_filename": "contract.pdf"
}url (required): public URL to your PDF. Must be reachable via HTTP/HTTPS.
original_filename (optional): original filename. Useful when the URL contains a generated or hashed filename (e.g. from R2 or S3) — stored and returned in results instead of what we extract from the URL.
Supported sources: AWS S3 (presigned URLs), Google Cloud Storage, Azure Blob, Dropbox shared links, your own CDN, or any publicly accessible URL.
Limits: 10 MB max file size, 30-second download timeout, 20-second analysis timeout. The URL must be reachable without authentication.
Response (201 Created)
{
"id": "3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21"
}Analysis runs synchronously. The response contains only the check id — call GET /v1/result/{id} immediately after to retrieve the full analysis.
With test keys the ID is a deterministic UUID v4 like 00000000-0000-4000-8000-000000000001 — passes UUID format validation but is obviously synthetic.
Two-step usage
# Step 1: Submit for analysis
curl -s -X POST https://api.htpbe.tech/v1/analyze \
-H "Authorization: Bearer htpbe_live_..." \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/contract.pdf"}'
# → { "id": "3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21" }
# Step 2: Retrieve full result
curl https://api.htpbe.tech/v1/result/3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21 \
-H "Authorization: Bearer htpbe_live_..."
# → { "status": "modified", "origin": { ... }, ... }Endpoint
GET /v1/result/{id} — retrieve a check
GET https://api.htpbe.tech/v1/result/{id}Retrieve a previously completed analysis by its check ID. Returns the full analysis including metadata, structure, signatures, and findings. Only checks that belong to your API client are returned.
Path parameter
id (required): check ID returned from POST /v1/analyze (full UUID v4).
Response (200 OK)
{
"id": "506a6b1b-1360-48a2-b389-abb346f85d04",
"filename": "contract.pdf",
"check_date": 1736542583,
"file_size": 245632,
"algorithm_version": "2.18.2",
"current_algorithm_version": "2.18.2",
"status": "modified",
"origin": { "type": "institutional", "software": null },
"creation_date": 1704110400,
"modification_date": 1707840000,
"creator": "Adobe Acrobat Pro DC",
"producer": "Adobe PDF Library 15.0",
"modification_confidence": "certain",
"date_sequence_valid": true,
"metadata_completeness_score": 90,
"xref_count": 4,
"has_incremental_updates": true,
"update_chain_length": 3,
"pdf_version": "1.7",
"has_digital_signature": false,
"signature_count": 0,
"signature_removed": true,
"modifications_after_signature": false,
"page_count": 12,
"object_count": 487,
"has_javascript": false,
"has_embedded_files": false,
"modification_markers": [
"HTPBE_SIGNATURE_REMOVED",
"HTPBE_DATES_DISAGREE"
]
}All date fields (check_date, creation_date, modification_date) are Unix timestamps in seconds.
modification_markers: every modification signal detected, ordered strongest-first.
algorithm_version: reflects the algorithm in use at the time of analysis. The current version may differ.
Error responses
// 404 Not Found - Check doesn't exist or belongs to another client
{
"error": "Check not found or access denied",
"code": "not_found"
}
// 401 Unauthorized - Invalid API key
{
"error": "Invalid API key. Please check your credentials.",
"code": "invalid_api_key"
}Endpoint
GET /v1/checks — list with filters
GET https://api.htpbe.tech/v1/checksPaginated list of all your checks with flexible filtering. Use it to build dashboards, export data, or run custom analytics on your own results.
Query parameters (all optional)
limit (1–500, default 100): results per page.
offset (default 0): pagination cursor.
tool: filter by tool name (matches Creator OR Producer).
creator: filter by Creator only.
producer: filter by Producer only.
status (intact/modified/inconclusive): filter by verdict.
from_date / to_date (Unix timestamp): filter by check date.
Response (200 OK)
{
"data": [
{
"id": "a3f5c9d2-1360-48a2-b389-abb346f85d04",
"filename": "invoice-2024-01.pdf",
"check_date": 1738368000,
"status": "modified",
"metadata_completeness_score": 85,
"creator": "Microsoft Word for Microsoft 365",
"producer": "Adobe PDF Library 15.0",
"file_size": 524288,
"page_count": 5,
"pdf_version": "1.7",
"creation_date": 1735689600,
"modification_date": 1738281600,
"has_javascript": false,
"has_digital_signature": true,
"has_embedded_files": false,
"has_incremental_updates": true,
"update_chain_length": 3,
"object_count": 234
}
],
"total": 1250,
"limit": 100,
"offset": 0,
"has_more": true
}Use cases: export raw data, build custom analytics, discover all tools in your traffic, filter only modified PDFs.
Pagination: use has_more to know when to stop.
Example: /v1/checks?status=modified&limit=200
Errors
Error response codes
All errors include an error string and a machine-readable code. Some include a details string with extra context. Requests beyond your monthly quota are charged at overage rates — there is no 429 cutoff.
| Code | Description |
|---|---|
400 | Bad Request — Invalid URL, malformed body, download failed |
401 | Unauthorized — Missing or invalid API key |
402 | Payment Required — No active subscription |
403 | Forbidden — Deactivated key, or test key used with non-test URL |
404 | Not Found — Check ID not found or belongs to a different API key |
413 | Payload Too Large — File exceeds 10 MB |
422 | Unprocessable Entity — Invalid or corrupted PDF |
500 | Internal Server Error — Processing failed |
Integration examples
Drop-in code in your stack
Curl, JavaScript, Python, Go, PHP, and Ruby — copy, paste, plug in your key.
# curl is preinstalled on macOS and most Linux distributions
# Step 1: Submit PDF for analysis
curl -X POST https://api.htpbe.tech/v1/analyze \
-H "Authorization: Bearer htpbe_live_..." \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/document.pdf"}'
# Returns: {"id":"3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21"}
# Step 2: Retrieve full results
ID="3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21"
curl -s "https://api.htpbe.tech/v1/result/$ID" \
-H "Authorization: Bearer htpbe_live_..." \
| jq '.status'Working with results
Reading, filtering, and rolling up checks
Examples of using the GET endpoints to pull a single result, find every modified PDF in your traffic, and roll a dashboard out of /checks alone.
Get check result by ID
// Retrieve a specific check result
const checkId = '506a6b1b-1360-48a2-b389-abb346f85d04';
const response = await fetch(
`https://api.htpbe.tech/v1/result/${checkId}`,
{
headers: {
'Authorization': `Bearer ${API_KEY}`
}
}
);
const result = await response.json();
console.log(`File: ${result.filename}`);
console.log(`Status: ${result.status}`);
console.log(`Markers: ${result.modification_markers.join(', ')}`);List all modified PDFs
import requests
from urllib.parse import quote
# Get all modified PDFs for manual review
response = requests.get(
'https://api.htpbe.tech/v1/checks',
params={
'status': 'modified',
'limit': 100
},
headers={'Authorization': f'Bearer {API_KEY}'}
)
data = response.json()
print(f"Found {data['total']} modified PDFs")
print(f"\nShowing first {len(data['data'])} results:")
for check in data['data'][:5]:
print(f"\n{check['filename']}")
print(f" Tool: {check['creator']} → {check['producer']}")
print(f" Review: https://htpbe.tech/result/{check['id']}")Build a dashboard
// Build a dashboard from /checks — no extra endpoints needed
async function fetchDashboardData(apiKey: string) {
const headers = { Authorization: `Bearer ${apiKey}` };
// Fetch all checks (paginate if needed)
const checksRes = await fetch(
'https://api.htpbe.tech/v1/checks?limit=500',
{ headers }
);
const { data: checks, total } = await checksRes.json();
// Calculate metrics from raw data
const modified = checks.filter((c) => c.status === 'modified').length;
const toolStats = new Map<string, { count: number; modified: number }>();
checks.forEach((check) => {
const tool = check.producer || 'Unknown';
const current = toolStats.get(tool) || { count: 0, modified: 0 };
toolStats.set(tool, {
count: current.count + 1,
modified: current.modified + (check.status === 'modified' ? 1 : 0)
});
});
return {
overview: {
total,
modified,
modificationRate: total > 0 ? ((modified / total) * 100).toFixed(1) : '0.0'
},
recentModified: checks
.filter((c) => c.status === 'modified')
.slice(0, 5)
.map((c) => ({ filename: c.filename, tool: c.producer })),
toolBreakdown: Array.from(toolStats.entries())
.map(([name, data]) => ({
name,
count: data.count,
modificationRate: ((data.modified / data.count) * 100).toFixed(1)
}))
.sort((a, b) => b.count - a.count)
};
}
const dashboardData = await fetchDashboardData(API_KEY);
console.log('Dashboard Data:', JSON.stringify(dashboardData, null, 2));Real-world patterns
Three production integrations teams ship in a sprint
bank statement checkBlock tampered statements before underwriting
One call before the underwriting decision. Modified verdict = decline and surface markers; inconclusive = manual review queue.
// Check bank statement before approving a loan application
async function checkBankStatement(statementUrl) {
// Step 1: Submit for analysis
const { id } = await fetch('https://api.htpbe.tech/v1/analyze', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + process.env.HTPBE_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({ url: statementUrl })
}).then(r => r.json());
// Step 2: Get full result
const result = await fetch(`https://api.htpbe.tech/v1/result/${id}`, {
headers: { 'Authorization': 'Bearer ' + process.env.HTPBE_API_KEY }
}).then(r => r.json());
if (result.status === 'modified') {
return {
approved: false,
reason: 'Bank statement has been modified — do not process application',
markers: result.modification_markers
};
}
if (result.status === 'inconclusive') {
return {
approved: false,
reason: 'Cannot determine document integrity — manual review required',
origin: result.origin
};
}
return { approved: true, reason: 'Bank statement is structurally intact' };
}bulk processingConcurrent scan across hundreds of documents
Submit and retrieve in parallel with aiohttp. Aggregate intact / modified / inconclusive counts for a daily intake report.
import asyncio
import aiohttp
async def analyze_bulk(urls: list[str], api_key: str):
"""Analyze multiple PDFs concurrently"""
headers = {'Authorization': f'Bearer {api_key}'}
async with aiohttp.ClientSession() as session:
# Step 1: Submit all PDFs for analysis
submit_tasks = [
session.post(
'https://api.htpbe.tech/v1/analyze',
headers={**headers, 'Content-Type': 'application/json'},
json={'url': url}
)
for url in urls
]
submit_responses = await asyncio.gather(*submit_tasks)
ids = [(await r.json())['id'] for r in submit_responses]
# Step 2: Retrieve all results
result_tasks = [
session.get(
f'https://api.htpbe.tech/v1/result/{id}',
headers=headers
)
for id in ids
]
result_responses = await asyncio.gather(*result_tasks)
results = [await r.json() for r in result_responses]
modified_count = sum(1 for r in results if r['status'] == 'modified')
inconclusive_count = sum(1 for r in results if r['status'] == 'inconclusive')
return {
'total': len(results),
'modified': modified_count,
'inconclusive': inconclusive_count,
'intact': len(results) - modified_count - inconclusive_count,
'details': results
}
# Process 100 documents in parallel
urls = [f'https://storage.example.com/doc_{i}.pdf' for i in range(100)]
summary = await analyze_bulk(urls, os.getenv('HTPBE_API_KEY'))
print(f"Scanned {summary['total']} docs: {summary['modified']} modified, {summary['inconclusive']} inconclusive, {summary['intact']} intact")document managementAuto-check every upload, alert on modified
Run a check inside the upload handler. Persist the verdict on the document row; notify the security team only on modified.
// Automatic tamper check on upload
async function handleDocumentUpload(file: File) {
// 1. Upload to your storage
const fileUrl = await uploadToS3(file);
// 2. Submit for analysis
const { id } = await fetch('https://api.htpbe.tech/v1/analyze', {
method: 'POST',
headers: {
'Authorization': `Bearer ${HTPBE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ url: fileUrl })
}).then(r => r.json());
// 3. Retrieve full result
const result = await fetch(`https://api.htpbe.tech/v1/result/${id}`, {
headers: { 'Authorization': `Bearer ${HTPBE_API_KEY}` }
}).then(r => r.json());
// 4. Store in database with detection status
await db.documents.create({
filename: file.name,
url: fileUrl,
intact: result.status === 'intact',
uploaded_at: new Date()
});
// 5. Alert if modified
if (result.status === 'modified') {
await notifySecurityTeam({
document: file.name,
findings: result.modification_markers
});
}
return result;
}LLM-friendly documentation
For AI assistants integrating with HTPBE?, the API is mirrored in a machine-readable format optimized for language models.
Ready to integrate?
API key issued on signup. Test keys free on every plan.
Live calls from $15/mo — no sales call, cancel any time.