Apache PDFBox appears on both legitimate first-generation output and downstream re-save flows — context (the other tool on the same document) is what flips the signal.
Back to all statisticsForensic verdictBased on 3 appearances across the HTPBE? corpus.
Corpus profile
Apache PDFBox is a Java PDF library used in many enterprise pipelines for both generation and post-processing (merging, signing, text extraction with re-emit).
PDFBox is legitimate inside enterprise pipelines. The contextual signal is when PDFBox is the latest Producer on a document whose Creator points to an unrelated institutional source.
Role in the workflow
Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). The same tool can appear in either slot, with very different modification profiles.
Distributions
The PDF versions Apache PDFBox writes when acting as Producer, and the other tools that appear in the same documents.
Most output is PDF 1.6 (100% of files where Apache PDFBox is the Producer).
Related profiles
Other tools that frequently share metadata with Apache PDFBox in the same documents. Each card links to its own forensic profile.
Long tail
Smaller cuts of the Apache PDFBox corpus — useful context, but treat each row as a single data point rather than a strong signal.
Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.