logologo
  • How it works
  • Why It Matters
  • Statistics
  • Pricing
  • API
logologo
  • How it works
  • Why It Matters
  • Statistics
  • Pricing
  • API
HTPBE?

Structural PDF tamper detection API. Catches edits your KYC stack misses.

Product

  • How It Works
  • Why It Matters
  • Use Cases
  • Pricing

Developers

  • API Reference
  • GitHub/docs
  • Changelogv2.23.1

Resources

  • FAQ
  • Blog
  • Comparisons
  • Legal & Imprint

© 2024–2026 TMI Iurii Rogulia · VAT ID: FI29845875 · Made in Finland 🇫🇮

Status

Algorithm v2.23.1

Tool profile

Apache PDFBox

Apache PDFBox appears on both legitimate first-generation output and downstream re-save flows — context (the other tool on the same document) is what flips the signal.

Back to all statistics
Forensic verdict

Mixed signal

Based on 3 appearances across the HTPBE? corpus.

Modification rate
33%-15pp below baseline
Corpus baseline: 48%
Total appearances
3
0.10% of corpus
Modification rate
33%
-15pp below baseline
Role split
0%C/100%P
Creator vs Producer share of appearances

Corpus profile

How Apache PDFBox shows up in HTPBE? corpus

Apache PDFBox is a Java PDF library used in many enterprise pipelines for both generation and post-processing (merging, signing, text extraction with re-emit).

PDFBox is legitimate inside enterprise pipelines. The contextual signal is when PDFBox is the latest Producer on a document whose Creator points to an unrelated institutional source.

The signal
PDFBox is legitimate inside enterprise pipelines.

Role in the workflow

How Apache PDFBox shows up in metadata

Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). The same tool can appear in either slot, with very different modification profiles.

CAs Creator · 0%
As Producer · 100%P
CAs Creator
  • Usage
    0
  • Modification rate
    0%
PAs Producer
  • Usage
    3
  • Modification rate
    33%
  • Avg file size
    225 KB

How to read this

The Creator slot typically reflects where a document started life. The Producer slot reflects whatever wrote the bytes — and is the field that gets overwritten when a PDF is opened, edited, and saved by a downstream tool.

A higher modification rate as Producer than as Creator usually means the tool is acting as a re-saver on documents that originated elsewhere. A higher rate as Creator points to fragile workflows around the original authoring app.

Distributions

What ships alongside Apache PDFBox

The PDF versions Apache PDFBox writes when acting as Producer, and the other tools that appear in the same documents.

PDF versions written

Most output is PDF 1.6 (100% of files where Apache PDFBox is the Producer).

PDF 1.6100.0%

Related profiles

Tools you’ll see next to Apache PDFBox

Other tools that frequently share metadata with Apache PDFBox in the same documents. Each card links to its own forensic profile.

C100% co-occurrence
Dropbox
Appearances13
Mod rate62%

Long tail

Notable observations

Smaller cuts of the Apache PDFBox corpus — useful context, but treat each row as a single data point rather than a strong signal.

Pages parsed
26
Oldest observed
11 Feb 2026 — 3 months ago

Secure your workflow

Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.

Start free — close the structural fraud gapSee pricing
Read API docs →