logo
PDF Security Blog

Fake Invoice Detection: What PDF Metadata Reveals

HTPBE Team··8 min read
Fake Invoice Detection: What PDF Metadata Reveals

Code examples verified against the API as of April 2026. If the API has changed since then, check the changelog.

Invoice fraud costs businesses an estimated $300 billion annually. The majority of fraudulent invoices succeed not because they look convincing, but because the organizations receiving them rely entirely on visual review. The same detection logic that applies to invoices applies equally to contracts: altered contract PDFs change payment terms or bank details after signing, and insurance claims fraud inflates repair estimates and medical bills before submission — using the same consumer editing tools and leaving the same structural traces.

A finance team member opens a PDF, confirms the vendor name, checks the amount against a purchase order, and approves payment. What that review misses is invisible to the eye but readable in milliseconds by forensic analysis: who modified the file, with what software, and when.

This guide covers what metadata signals reveal invoice fraud, how attackers modify invoices, and how to automate detection at scale.

Why visual review fails

Consider the most common invoice fraud pattern: business email compromise with modified bank details. An attacker intercepts a legitimate vendor invoice — either by compromising email or by social engineering — and replaces the bank account number before forwarding it to accounts payable. The invoice looks identical to every previous invoice from that vendor. The amount is correct. The logo is correct. The line items match the purchase order.

The only thing that changed is two lines of text: the account number and the sort code.

A human reviewer has no way to know the document was modified. The PDF looks the same as it always has. But the file is not the same — and the file preserves evidence that the human reviewer never sees.

What happens to a PDF when it is edited

PDF files store changes incrementally. When an editor opens a PDF and makes a change, the original content is preserved and the new content is appended. The file grows. A new cross-reference table (xref) entry is written at the end of the file pointing to the updated content.

This means an invoice modified once has at least two xref tables. Modified three times: at least four. The original creation content is still in the file, alongside every subsequent edit. A forensic analysis reads this structure.

Beyond structure, two other records are left:

The modification timestamp. PDF embeds a ModDate field in its Info dictionary. Many editors update this automatically when saving. An invoice from three months ago showing a modification date of yesterday has been touched recently.

The Producer field. This identifies the software that last processed the file. An invoice generated by an enterprise accounting system showing iLovePDF or PDF24 as producer has been through a consumer PDF editor — something that should not appear on an original vendor invoice.

The four signals that catch most invoice fraud

1. Producer field mismatch

Legitimate invoices from vendors using professional billing systems — SAP, Oracle, QuickBooks, Xero, Sage — have consistent Producer fields reflecting their software stack. An invoice claiming to come from an enterprise vendor but showing a consumer editor as Producer is immediately suspect.

This is the signal that catches the largest category of invoice fraud: attackers who download a legitimate invoice, open it in a free PDF editor to change the bank details, and re-save it. The editor writes its own name into the Producer field.

2. Modification date after receipt

If you record when a document was received and compare it against the document’s embedded modification date, a modification date that post-dates receipt is impossible under normal circumstances. The document was modified after you received it — either before forwarding within your organization, or by someone who intercepted and re-sent it.

3. Multiple xref tables in a single-session document

A document generated once by an accounting system should have one xref table representing a single authoring session. Multiple xref tables in what should be a freshly generated document indicate editing after generation.

This signal requires context: some legitimate documents do have multiple xref tables (signed documents that had a signature field added, for example). Combined with other signals, it becomes highly indicative.

4. Metadata completeness and consistency

Professional invoicing software generates documents with complete, consistent metadata: creator, producer, creation date, modification date all populated and coherent. Fabricated or heavily modified invoices frequently have incomplete metadata — missing dates, empty creator fields, or timestamps that are clearly defaults rather than real values.

A metadata completeness score below a threshold is not proof of fraud, but it raises the probability of a non-institutional document origin.

What attackers do — and what gives them away

Bank detail replacement (most common). Attacker opens the invoice in any PDF editor, modifies the account number, saves. Result: Producer field changes to the editor’s name. Modification date updates. Xref count increases.

Amount modification. Attacker inflates the invoice total. Same signals as bank detail replacement, plus often a date anomaly if the original invoice had been on file for a while before the modification.

Duplicate invoice with modified details. Attacker copies an old invoice and modifies multiple fields. The file retains the original’s creation date but shows a recent modification date — and often a different Producer than the original.

Fabricated invoice from scratch. Attacker builds a new invoice in design software (Canva, Microsoft Word, Google Docs) and exports to PDF. The Creator field reveals the software used. The document lacks the structural characteristics of professionally generated invoices. This is the pattern inconclusive verdicts often catch — the origin is consumer software, which is incompatible with a claimed enterprise vendor.

Automated detection in an AP workflow

The pattern for AP automation:

  1. Vendor submits invoice (email attachment, supplier portal upload, EDI)
  2. Before any human touches it, run forensic analysis via the HTPBE API
  3. Route based on verdict: intact → normal processing, modified → flag for investigation, inconclusive → secondary review if the vendor claims enterprise billing system
import httpx
import os

API_KEY = os.environ["HTPBE_API_KEY"]
BASE_URL = "https://api.htpbe.tech/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}


def check_invoice(invoice_url: str, vendor_name: str) -> dict:
    """
    Run forensic analysis on an invoice PDF.
    Returns a dict with action and details for the AP system.
    """
    submit = httpx.post(
        f"{BASE_URL}/analyze",
        headers=HEADERS,
        json={"url": invoice_url},
        timeout=30,
    )
    submit.raise_for_status()
    check_id = submit.json()["id"]

    result = httpx.get(
        f"{BASE_URL}/result/{check_id}",
        headers=HEADERS,
        timeout=30,
    )
    result.raise_for_status()
    data = result.json()

    status = data["status"]
    markers = data.get("modification_markers", [])
    producer = data.get("producer", "unknown")

    if status == "intact":
        return {
            "action": "process",
            "vendor": vendor_name,
            "check_id": check_id,
        }

    if status == "modified":
        return {
            "action": "hold",
            "vendor": vendor_name,
            "check_id": check_id,
            "reason": f"Forensic markers detected: {', '.join(markers)}",
            "producer": producer,
        }

    # inconclusive — consumer software origin
    return {
        "action": "review",
        "vendor": vendor_name,
        "check_id": check_id,
        "reason": f"Invoice generated by consumer software: {producer}",
    }

The check_id returned by the API is a permanent reference — link it to your invoice record so your finance team can view the full forensic report when investigating a flagged payment.

What to do when an invoice is flagged

A modified verdict does not automatically mean fraud. It means the file was edited. Legitimate reasons exist: a vendor corrected a typo, your team redacted sensitive information before filing, a PDF printer regenerated the file during email transmission.

The forensic report tells you which specific signals triggered the verdict. PRODUCER_MISMATCH with iLovePDF as producer and a recent modification date is a strong fraud signal. A single extra xref table with no modification date change and a consistent Producer may warrant calling the vendor to confirm the document is unchanged.

The response workflow:

  1. Put the payment on hold immediately
  2. Retrieve the full forensic report (the check_id links to it in your dashboard)
  3. Contact the vendor through a verified channel — not by replying to the email that carried the invoice
  4. Confirm account details verbally or through your vendor portal
  5. If fraud is confirmed: report to your bank immediately (many jurisdictions have 24-hour windows for payment recall), and to relevant law enforcement

False positive rate

Modern forensic PDF analysis on invoice documents has a low false positive rate for the specific case of bank detail modification — the most common fraud pattern. Producer field analysis is highly specific: a vendor that generates invoices through a professional billing system will have a consistent Producer across all invoices. Deviation is meaningful.

The higher false positive rate appears in edge cases: vendors who genuinely do use consumer PDF tools (small sole traders using Canva invoices), documents that were legitimately converted between formats, or signed documents where the signing software appended its own xref entry.

For these cases, the inconclusive verdict is the appropriate output rather than modified. A sole trader using Google Docs to generate invoices will return inconclusive, not modified — flagged for review rather than automatic rejection.

Scaling the check

For high-volume AP operations, the check runs asynchronously and takes under two seconds per document. At the Growth plan rate, 350 checks per month covers roughly 12 invoice verifications per working day — appropriate for a mid-size AP team. The Pro plan (1,500 checks) covers up to 50 checks per day.

The cost per prevented fraudulent payment is essentially zero. The average business email compromise invoice fraud loss is $50,000–$200,000 per incident. A month of Pro API access costs $499.

Share This Article

Found this article helpful? Share it with others to spread knowledge about PDF security and verification.

https://htpbe.tech/blog/fake-invoice-detection-pdf-metadata

Automate PDF Verification in Your Workflow

REST API with transparent pricing from $15/mo. Self-serve — no sales call required.
Free web tool available for manual checks. Test keys on all plans.

View API Docs