logo
REST API

PDF Authenticity & Tamper Detection API

Detect tampered bank statements, invoices, and financial documents before they enter your pipeline.

42 tamper detection checks
Results in under 10 seconds
No original document needed

Quick Start Example

bash
# Basic usage - analyze any publicly accessible PDF
curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/document.pdf"}'
Get your API key

Key issued on signup. Test environment free on every plan. Live calls from $15/mo.

The Detection Gap

What this catches that KYC doesn’t

KYC platforms confirm that a document looks real. HTPBE? detects whether the specific PDF file was modified after it was generated.

An applicant submits a bank statement. Your KYC platform confirms the account exists and the identity matches. But the balance has been edited — $2,400 inflated to $24,000. The edit is invisible to visual review and passes all template checks.

HTPBE? checks the PDF file structure, not the content. The modification shows up in the xref table, the incremental update chain, and the producer field — regardless of what the document looks like on screen.

  • One API call — result in under 10 seconds
  • No original document needed — standalone structural analysis
  • Works alongside Plaid, Persona, and Alloy — not instead of them

Example Response — Modified Verdict

json
{
  "status": "modified",
  "modification_confidence": "high",
  "modification_markers": [
    "Multiple xref tables detected",
    "Incremental update chain length: 3",
    "Creation and modification dates differ"
  ],
  "has_incremental_updates": true,
  "xref_count": 3,
  "creator": "Microsoft Excel",
  "producer": "Adobe PDF Library 15.0"
}

The full response includes 20+ fields. See complete schema on GitHub →

API Documentation

A simple REST API with comprehensive PDF analysis capabilities

Complete API Reference on GitHub

For detailed field-by-field documentation including all possible values, error codes, and comprehensive examples, visit our GitHub documentation:

Quick Reference

MethodEndpointDescription
POST/api/v1/analyzeAnalyze PDF from URL for modifications
GET/api/v1/result/{id}Retrieve previously completed check by ID
GET/api/v1/checksList all checks with filtering and pagination

Base URL: https://api.htpbe.tech/v1
Authentication: All endpoints require Authorization: Bearer YOUR_API_KEY
Monthly Quota: Depends on your plan — see Pricing for details.

Analyze PDF Document

POST https://api.htpbe.tech/v1/analyze

Request Headers

http
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

Your API key is issued automatically when you sign up. Supports both htpbe_live_... (production) and htpbe_test_... (testing) keys.

Request Body

json
{
  "url": "https://example.com/documents/contract.pdf",
  "original_filename": "contract.pdf"
}

url (required): Public URL to your PDF file. Must be accessible via HTTP/HTTPS.

original_filename (optional): Original filename of the document. Useful when the URL contains a generated or hashed filename (e.g. from Vercel Blob or S3). When provided, this name is stored and returned in results instead of the filename extracted from the URL.

Supported sources: AWS S3 (presigned URLs), Google Cloud Storage, Azure Blob, Dropbox shared links, your own CDN, or any publicly accessible URL.

Limitations: Max 10 MB file size, 30-second download timeout, 20-second analysis timeout. URL must be publicly accessible without authentication.

Response (201 Created)

json
{
  "id": "3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21"
}

Analysis is performed synchronously. The response contains only the check id — call GET /api/v1/result/{id} immediately after to retrieve the full analysis.

With test keys the ID is a deterministic UUID v4 like 00000000-0000-4000-8000-000000000001 — passes UUID format validation but is obviously synthetic.

Two-Step Usage

bash
# Step 1: Submit for analysis
curl -s -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer htpbe_live_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/contract.pdf"}'
# → { "id": "3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21" }

# Step 2: Retrieve full result
curl https://api.htpbe.tech/v1/result/3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21 \
  -H "Authorization: Bearer htpbe_live_..."
# → { "status": "modified", "origin": { ... }, ... }

Retrieve Check Result

GET https://api.htpbe.tech/v1/result/{id}

Description

Retrieve a previously completed PDF analysis by its unique check ID. Returns the full analysis data including metadata, structure, signatures, and findings. Only returns checks that belong to your API client.

Request Headers

http
Authorization: Bearer YOUR_API_KEY

Path Parameters

id (required): Check ID returned from POST /api/v1/analyze (full UUID v4)

Response (200 OK)

json
{
  "id": "506a6b1b-1360-48a2-b389-abb346f85d04",
  "filename": "contract.pdf",
  "check_date": 1736542583,
  "file_size": 245632,
  "algorithm_version": "2.2.1",
  "current_algorithm_version": "2.2.1",
  "status": "modified",
  "origin": { "type": "institutional", "software": null },
  "creation_date": 1704110400,
  "modification_date": 1707840000,
  "creator": "Adobe Acrobat Pro DC",
  "producer": "Adobe PDF Library 15.0",
  "modification_confidence": "certain",
  "date_sequence_valid": true,
  "metadata_completeness_score": 90,
  "xref_count": 2,
  "has_incremental_updates": true,
  "update_chain_length": 3,
  "pdf_version": "1.7",
  "has_digital_signature": false,
  "signature_count": 0,
  "signature_removed": true,
  "modifications_after_signature": false,
  "page_count": 12,
  "object_count": 487,
  "has_javascript": false,
  "has_embedded_files": false,
  "modification_markers": [
    "Digital signature was removed",
    "Different creation and modification dates"
  ]
}

All date fields (check_date, creation_date, modification_date) are Unix timestamps (seconds since epoch).

modification_markers: All modification signals detected, ordered strongest-first

algorithm_version: Version numbers reflect the algorithm in use at the time of analysis. The current version may differ.

Error Responses

json
// 404 Not Found - Check doesn't exist or belongs to another client
{
  "error": "Check not found or access denied",
  "code": "not_found"
}

// 401 Unauthorized - Invalid API key
{
  "error": "Invalid API key. Please check your credentials.",
  "code": "invalid_api_key"
}

List All Checks

GET https://api.htpbe.tech/v1/checks

Description

Retrieve a paginated list of all your PDF check results with flexible filtering options. This endpoint provides raw data access for custom analytics, exports, and advanced reporting. Use it to build dashboards, export data, or perform custom analysis on your PDF checks.

Request Headers

http
Authorization: Bearer YOUR_API_KEY

Query Parameters (All Optional)

limit (1-500, default: 100): Number of results per page

offset (default: 0): Number of results to skip for pagination

tool: Filter by tool name (matches Creator OR Producer)

creator: Filter by Creator tool only

producer: Filter by Producer tool only

status (intact/modified/inconclusive): Filter by verdict

from_date / to_date (Unix timestamp): Filter by check date (when analysis was performed)

Response (200 OK)

json
{
  "data": [
    {
      "id": "a3f5c9d2-1360-48a2-b389-abb346f85d04",
      "filename": "invoice-2024-01.pdf",
      "check_date": 1738368000,
      "status": "modified",
      "metadata_completeness_score": 85,
      "creator": "Microsoft Word for Microsoft 365",
      "producer": "Adobe PDF Library 15.0",
      "file_size": 524288,
      "page_count": 5,
      "pdf_version": "1.7",
      "creation_date": 1735689600,
      "modification_date": 1738281600,
      "has_javascript": false,
      "has_digital_signature": true,
      "has_embedded_files": false,
      "has_incremental_updates": true,
      "update_chain_length": 3,
      "object_count": 234
    }
  ],
  "total": 1250,
  "limit": 100,
  "offset": 0,
  "has_more": true
}

Use cases: Export all data, build custom analytics, discover all tools, filter modified PDFs

Pagination: Usehas_moreto know when to stop

Example:/api/v1/checks?status=modified&limit=200

Error Responses

All errors include an error string and a machine-readable code. Some errors also include a details string with additional context. Requests beyond your monthly quota are charged at overage rates — there is no 429 cutoff.

CodeDescription
400Bad Request — Invalid URL, malformed body, download failed
401Unauthorized — Missing or invalid API key
402Payment Required — No active subscription
403Forbidden — Deactivated key, or test key used with non-test URL
404Not Found — Check ID not found or belongs to a different API key
413Payload Too Large — File exceeds 10 MB
422Unprocessable Entity — Invalid or corrupted PDF
500Internal Server Error — Processing failed

Integration Examples

Get started quickly with these code examples.

bash
# curl is preinstalled on macOS and most Linux distributions

# Step 1: Submit PDF for analysis
curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer htpbe_live_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/document.pdf"}'
# Returns: {"id":"3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21"}

# Step 2: Retrieve full results
ID="3f9c8b7a-2e1d-4c5f-9b8e-7a6d5c4b3a21"
curl -s "https://api.htpbe.tech/v1/result/$ID" \
  -H "Authorization: Bearer htpbe_live_..." \
  | jq '.status'

Retrieving Results and Check History

Examples of using the GET endpoints to retrieve check results and check history

Get Check Result by ID

typescript
// Retrieve a specific check result
const checkId = '506a6b1b-1360-48a2-b389-abb346f85d04';

const response = await fetch(
  `https://api.htpbe.tech/v1/result/${checkId}`,
  {
    headers: {
      'Authorization': `Bearer ${API_KEY}`
    }
  }
);

const result = await response.json();

console.log(`File: ${result.filename}`);
console.log(`Status: ${result.status}`);
console.log(`Markers: ${result.modification_markers.join(', ')}`);

Analyze Specific Tool Usage

python
import requests
from urllib.parse import quote

# Get all modified PDFs for manual review
response = requests.get(
    'https://api.htpbe.tech/v1/checks',
    params={
        'status': 'modified',
        'limit': 100
    },
    headers={'Authorization': f'Bearer {API_KEY}'}
)

data = response.json()

print(f"Found {data['total']} modified PDFs")
print(f"\nShowing first {len(data['data'])} results:")

for check in data['data'][:5]:
    print(f"\n{check['filename']}")
    print(f"  Tool: {check['creator']}{check['producer']}")
    print(f"  Review: https://htpbe.tech/result/{check['id']}")

Building a Dashboard

typescript
// Build a dashboard from /checks — no extra endpoints needed
async function fetchDashboardData(apiKey: string) {
  const headers = { Authorization: `Bearer ${apiKey}` };

  // Fetch all checks (paginate if needed)
  const checksRes = await fetch(
    'https://api.htpbe.tech/v1/checks?limit=500',
    { headers }
  );
  const { data: checks, total } = await checksRes.json();

  // Calculate metrics from raw data
  const modified = checks.filter((c) => c.status === 'modified').length;
  const toolStats = new Map<string, { count: number; modified: number }>();

  checks.forEach((check) => {
    const tool = check.producer || 'Unknown';
    const current = toolStats.get(tool) || { count: 0, modified: 0 };
    toolStats.set(tool, {
      count: current.count + 1,
      modified: current.modified + (check.status === 'modified' ? 1 : 0)
    });
  });

  return {
    overview: {
      total,
      modified,
      modificationRate: total > 0 ? ((modified / total) * 100).toFixed(1) : '0.0'
    },
    recentModified: checks
      .filter((c) => c.status === 'modified')
      .slice(0, 5)
      .map((c) => ({ filename: c.filename, tool: c.producer })),
    toolBreakdown: Array.from(toolStats.entries())
      .map(([name, data]) => ({
        name,
        count: data.count,
        modificationRate: ((data.modified / data.count) * 100).toFixed(1)
      }))
      .sort((a, b) => b.count - a.count)
  };
}

const dashboardData = await fetchDashboardData(API_KEY);
console.log('Dashboard Data:', JSON.stringify(dashboardData, null, 2));

Real-World Use Cases

1. Bank Statement Fraud Detection (Lending)

javascript
// Check bank statement before approving a loan application
async function checkBankStatement(statementUrl) {
  // Step 1: Submit for analysis
  const { id } = await fetch('https://api.htpbe.tech/v1/analyze', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer ' + process.env.HTPBE_API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url: statementUrl })
  }).then(r => r.json());

  // Step 2: Get full result
  const result = await fetch(`https://api.htpbe.tech/v1/result/${id}`, {
    headers: { 'Authorization': 'Bearer ' + process.env.HTPBE_API_KEY }
  }).then(r => r.json());

  if (result.status === 'modified') {
    return {
      approved: false,
      reason: 'Bank statement has been modified — do not process application',
      markers: result.modification_markers
    };
  }

  if (result.status === 'inconclusive') {
    return {
      approved: false,
      reason: 'Cannot determine document integrity — manual review required',
      origin: result.origin
    };
  }

  return { approved: true, reason: 'Bank statement is structurally intact' };
}

2. Bulk Document Tamper Detection

python
import asyncio
import aiohttp

async def analyze_bulk(urls: list[str], api_key: str):
    """Analyze multiple PDFs concurrently"""
    headers = {'Authorization': f'Bearer {api_key}'}

    async with aiohttp.ClientSession() as session:
        # Step 1: Submit all PDFs for analysis
        submit_tasks = [
            session.post(
                'https://api.htpbe.tech/v1/analyze',
                headers={**headers, 'Content-Type': 'application/json'},
                json={'url': url}
            )
            for url in urls
        ]
        submit_responses = await asyncio.gather(*submit_tasks)
        ids = [(await r.json())['id'] for r in submit_responses]

        # Step 2: Retrieve all results
        result_tasks = [
            session.get(
                f'https://api.htpbe.tech/v1/result/{id}',
                headers=headers
            )
            for id in ids
        ]
        result_responses = await asyncio.gather(*result_tasks)
        results = [await r.json() for r in result_responses]

        modified_count = sum(1 for r in results if r['status'] == 'modified')
        inconclusive_count = sum(1 for r in results if r['status'] == 'inconclusive')

        return {
            'total': len(results),
            'modified': modified_count,
            'inconclusive': inconclusive_count,
            'intact': len(results) - modified_count - inconclusive_count,
            'details': results
        }

# Process 100 documents in parallel
urls = [f'https://storage.example.com/doc_{i}.pdf' for i in range(100)]
summary = await analyze_bulk(urls, os.getenv('HTPBE_API_KEY'))
print(f"Scanned {summary['total']} docs: {summary['modified']} modified, {summary['inconclusive']} inconclusive, {summary['intact']} intact")

3. Document Management System Integration

typescript
// Automatic tamper check on upload
async function handleDocumentUpload(file: File) {
  // 1. Upload to your storage
  const fileUrl = await uploadToS3(file);

  // 2. Submit for analysis
  const { id } = await fetch('https://api.htpbe.tech/v1/analyze', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${HTPBE_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url: fileUrl })
  }).then(r => r.json());

  // 3. Retrieve full result
  const result = await fetch(`https://api.htpbe.tech/v1/result/${id}`, {
    headers: { 'Authorization': `Bearer ${HTPBE_API_KEY}` }
  }).then(r => r.json());

  // 4. Store in database with detection status
  await db.documents.create({
    filename: file.name,
    url: fileUrl,
    intact: result.status === 'intact',
    uploaded_at: new Date()
  });

  // 5. Alert if modified
  if (result.status === 'modified') {
    await notifySecurityTeam({
      document: file.name,
      findings: result.modification_markers
    });
  }

  return result;
}

Enterprise: On-Premise Deployment

Deploy the HTPBE? analyzer in your own infrastructure — documents never leave your network.

Designed for banks, healthcare providers, government agencies, and legal firms with strict data privacy requirements (GDPR, HIPAA, PCI DSS, SOX compliance).

100% Data Privacy

Documents analyzed entirely within your infrastructure. No files, metadata, or results ever transmitted to external servers.

Compliance Ready

GDPR, HIPAA, PCI DSS, SOX compliant by design. Your legal and security team approves the deployment. We provide the software.

Easy Deployment

Single Docker container or Kubernetes deployment. Production-ready in under 30 minutes. No file size limits — configure resources as needed.

Custom Development

Need specific integrations, custom webhook logic, or modifications to match your business processes? We build it for you.

Pricing: Custom pricing based on your requirements. Includes a dedicated account manager, priority support (1-hour response time guaranteed), and regular updates and security patches.

Technical details: View full on-premise deployment documentation →

LLM-Friendly Documentation

For AI assistants and LLM integration, our API documentation is available in a machine-readable format optimized for language models.

View llms.txt

Ready to integrate?

API key issued on signup. Test keys free on every plan.
Live calls from $15/mo — no sales call, cancel any time.