PDF Security Blog

GDPR Document Fraud Detection API: EU Fintech Compliance Guide

HTPBE Team·13.05.2026·11 min read

This article is a snapshot — content was accurate as of May 2026 (code examples tested against the API as of April 2026). The product evolves actively; specific counts, examples, and detection rules may have changed since publication — see the changelog for the current state.

A fintech compliance lead at a Dutch lending platform asks this question in a DPIA workshop: “When we send a customer’s bank statement to the fraud detection API, who is the controller, who is the processor, and what personal data is being transferred?”

That is the right question. The answer depends entirely on what the API does with the document — and most GDPR document fraud detection API integrations do not explain this clearly. This article explains how structural PDF analysis differs from document reading, what the GDPR implications are in practice, and what your DPIA should cover when integrating a GDPR compliant document fraud detection API in Europe.

What the Compliance Question Is

When a customer uploads a bank statement, payslip, or contract to your platform, that document contains personal data: name, address, account number, transaction history, salary, employer details. Under the GDPR, your organisation is the controller of that data. You decide the purposes and means of processing.

When you send that document to a third-party API for fraud detection, the question GDPR requires you to answer is: what personal data is that third party processing, for what purpose, and under what legal basis?

The answers differ significantly depending on what the API does. An API that extracts text content, reads transaction lines, or stores the document itself is processing the personal data in the document. The compliance obligations are substantial: a data processing agreement, legitimate interest or consent basis for onward transfer, and potentially a transfer impact assessment if the API is outside the EU.

An API that analyzes only the structural layer of the PDF — the metadata, binary structure, and edit history — without reading or storing document content is a different category of processing. Understanding this distinction is the starting point for any DPIA on document fraud detection tooling.

How Structural Analysis Works Without Reading the Document

A PDF file has two distinct layers. The first is the content layer: the text, images, and visual elements that a person reads when they open the document. The second is the structural layer: the metadata, cross-reference tables, producer and creator fields, timestamp records, and binary object structure that the generating software wrote into the file.

Structural forensic analysis operates exclusively on the second layer. It does not read text. It does not parse account numbers, names, or transaction histories. It reads the file’s own internal records: what software created this file, when, whether it was subsequently edited, by what software, and whether a digital signature was modified or removed after signing.

The resulting analysis verdict contains no personal data. An API response looks like this:

{
  "id": "c7e1f204-a3d9-41bc-b882-9c3d5f8a1e27",
  "status": "modified",
  "creator": "Xero Payroll",
  "producer": "iLovePDF",
  "creation_date": 1740700800,
  "modification_date": 1741132800,
  "xref_count": 2,
  "has_incremental_updates": true,
  "has_digital_signature": false,
  "modification_markers": [
    "Known PDF editing tool detected",
    "Different creation and modification dates",
    "Creator and producer mismatch"
  ]
}

There is no name in this response. No account number. No transaction data. No salary figure. The response describes the software origin and edit history of the file itself — not the content the file contains.

This is data minimisation applied at the technical layer, not only as a policy commitment.

Data Minimisation in Practice: The GDPR Argument for Structural-Only APIs

Article 5(1)(c) of the GDPR requires that personal data be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed.”

In the context of document fraud detection, the relevant purpose is checking that a document has not been tampered with. That purpose does not require processing the document’s content. It requires processing the document’s structure.

When HTPBE? receives a document URL, it downloads the PDF, extracts structural signals from the binary layer, generates the verdict and named modification markers, and discards the document. The PDF is never stored. The content layer is never parsed or retained. What is stored is the analysis result: the verdict (intact, modified, or inconclusive), the structural signals that produced it, and the check ID that links the result to your records.

In practice: the personal data in the document — the customer’s name, address, account number, salary — never leaves your infrastructure during HTPBE?’s analysis. You hold the document in your own storage and pass HTPBE? a URL. HTPBE? fetches the file, analyzes it, and returns a structural verdict. Your customer’s data stays in your system.

Controller, Processor, and the DPA

Under GDPR Article 28, when a controller uses a processor to process personal data on its behalf, a data processing agreement (DPA) is required. The processor must process personal data only on documented instructions from the controller, and must implement appropriate technical and organisational measures.

In the structural analysis flow, HTPBE? acts as a data processor for a narrow and well-defined processing operation: analyzing the structural layer of a PDF file on your instruction, for the purpose of document integrity fraud detection. Your organisation remains the controller. The DPA terms that apply to this engagement are standard for a processor relationship of this kind.

The scope of processing under that DPA is deliberately narrow. HTPBE? does not use document content for any purpose beyond the immediate analysis. It does not train models on submitted documents. It does not re-use analysis results for its own product development. HTPBE? processes the minimum necessary to produce the verdict, on instruction, for the stated purpose.

For EU-based companies evaluating onboarding due diligence for third-party processors, the DPA is available on the API page. DPA review is a standard step before API integration in any GDPR-compliant stack.

What INCONCLUSIVE Means in a GDPR Context

The three verdicts — intact, modified, inconclusive — carry different operational meanings for EU compliance workflows.

intact means no post-creation modifications were detected. The structural layer is consistent with the document’s claimed origin.

modified means post-creation changes were detected: the file has structural evidence of edit operations performed after initial generation.

inconclusive is the verdict that requires explanation in compliance contexts. It does not mean the analysis failed. It means the document was produced by consumer software — Microsoft Word, Google Docs, LibreOffice, a browser-based PDF tool — rather than an institutional document generation system. Consumer software does not write the structural patterns that allow HTPBE? to distinguish initial creation from a later edit. As a result, HTPBE? cannot determine whether the document was modified after creation.

For EU fintech teams, the operational significance of inconclusive depends on the document type. A bank statement from ING, ABN AMRO, Rabobank, or BNP Paribas is generated by institutional banking infrastructure. If your platform receives a bank statement that returns inconclusive with a consumer-software origin, the document was not generated by a banking system. That is a material signal regardless of whether specific modifications can be proven.

For user-generated documents — a letter of explanation, a self-employed income declaration, a cover letter — inconclusive is an expected result and not a fraud signal.

Retention: What Is Stored and for How Long

HTPBE? stores the analysis result: the structural verdict, the named modification markers, and the check ID. It does not store the document. The PDF is fetched for analysis, analyzed, and not retained.

Analysis results are accessible via the API using the check ID (GET /api/v1/result/{id}). Your organisation controls how long you retain the check ID in your own records. If your document retention policy requires that document integrity checks be stored for the duration of the customer relationship, you retain the check ID alongside the application record and retrieve the full result when needed.

This means the audit trail — evidence that a document integrity check was performed, when, and what result it produced — is under your control, not embedded in a third-party system you cannot access or export.

EU Data Residency, On-Premise Deployment, and GDPR Compliant Document Fraud Detection in Europe

For organisations whose data residency requirements mandate EU-only processing — common for regulated entities under the EBA’s guidelines on outsourcing arrangements, or under specific national regulator requirements in Germany, France, and the Netherlands — the cloud API model may require a transfer impact assessment if processing occurs outside the EU.

HTPBE?’s cloud API is currently deployed on EU infrastructure. For organisations where regulatory requirements, internal policy, or DPA obligations require that document analysis never leave their own infrastructure, the Enterprise plan supports on-premise deployment within your own environment.

On-premise deployment means the analysis engine runs within your infrastructure. No document URL is sent to a third party. No verdict is transmitted externally. The processing is entirely within your control and your data residency perimeter. For EU financial institutions subject to DORA (Digital Operational Resilience Act) requirements on third-party ICT risk, on-premise deployment simplifies the regulatory classification of the tool significantly.

Contact the team to discuss Enterprise on-premise options.

Practical GDPR Checklist for Your DPIA

When completing a Data Protection Impact Assessment for a document fraud detection API integration, address the following points.

Processing purpose and necessity. Can you articulate why structural integrity fraud detection of submitted documents is necessary for your processing purpose (e.g., credit risk assessment, KYC, fraud prevention)? Document the fraud risk your organisation is mitigating and why a structural check is proportionate to that risk.

Categories of personal data involved. What personal data do the documents you submit for fraud detection contain? Bank statements contain financial account data (a special category under many national implementations). Payslips contain employment and salary data. Clarify whether any of this data is processed by the fraud detection API, or whether only the structural layer is processed.

Data minimisation assessment. Does the API process content, or only structure? Request technical documentation from the vendor confirming what data is extracted, processed, and retained. For HTPBE?, the answer is: structural signals only; document content is not processed or retained.

Processor due diligence. Review the DPA. Confirm subprocessors are listed. Confirm the processor’s security certifications and incident notification procedures. For EU-regulated entities, confirm the processor’s data residency and transfer mechanisms.

Retention and deletion. What does the API store? For how long? Can you request deletion? Document the retention period for analysis results and confirm alignment with your own retention policies.

Transfer impact assessment. If the API processes data outside the EU/EEA, conduct a TIA under Article 46. Confirm the transfer mechanism (Standard Contractual Clauses, adequacy decision, or equivalent). For on-premise deployments, this step is not required.

Risk rating. Given the above, what is the residual risk of integrating this processor? For structural-only analysis APIs with no content processing or storage, the residual risk is typically low. Document this conclusion explicitly.

Integration for Compliance Teams

The HTPBE? API integration is a single POST endpoint. The typical integration point in an EU fintech workflow is document intake: after the customer uploads a document to your platform, and before it is stored in your CRM or passed to underwriting.

curl -X POST https://api.htpbe.tech/v1/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-storage.example.com/documents/bank-statement-customer-123.pdf"}'

The response includes the check ID. Store this alongside your document record. The check ID serves as the audit-trail reference: it links the stored analysis result (retrievable via GET /api/v1/result/{id}) to the specific document submission in your records.

For EU compliance teams evaluating this for a KYC onboarding workflow, the KYC onboarding solutions page covers the full integration pattern, including how to route on modified and inconclusive verdicts in a document intake pipeline.

Who This Is For

This article is for EU fintech compliance leads and DPOs at companies building or auditing document intake pipelines that use a GDPR document fraud detection API or equivalent third-party fraud detection tooling.

If your organisation is completing a DPIA for a new document fraud detection tool, the questions above apply regardless of the vendor. If you are evaluating HTPBE? specifically, the structural analysis architecture means the DPIA scope is narrower than for APIs that read document content: you are assessing a processor that handles structural metadata, not a processor that handles personal data from the document’s content layer.

Sign up for an API key to review the full API response schema and begin your technical due diligence. Test keys return deterministic results using mock documents and do not process any real customer data — suitable for security review and integration testing before production deployment.