PyPDF appears on both legitimate first-generation output and downstream re-save flows — context (the other tool on the same document) is what flips the signal.
Back to all statisticsForensic verdictBased on 84 appearances across the HTPBE? corpus.
Corpus profile
PyPDF is a Python PDF library (pypdf / PyPDF2 family) widely used in scripting and back-office automation to merge, split, encrypt, and extract pages from existing PDFs.
PyPDF rarely creates documents from scratch — it almost always re-emits an existing PDF after a programmatic operation. When PyPDF is the latest Producer on a document whose Creator was institutional, the producer/creator mismatch indicates programmatic post-processing.
Role in the workflow
Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). The same tool can appear in either slot, with very different modification profiles.
Name fingerprints
Different version strings and spellings observed for PyPDF in the wild. All are merged into the same canonical profile.
Why variants matter
The same tool publishes itself under 3 different metadata strings — version bumps, locale tags, build IDs. We canonicalize them so the corpus reflects one identity, not noise.
Distributions
The PDF versions PyPDF writes when acting as Producer, and the other tools that appear in the same documents.
Most output is PDF 1.4 (96% of files where PyPDF is the Producer).
Quadient sits upstream in 50% of cases — read this row as “what kinds of documents end up routed through PyPDF.”
Related profiles
Other tools that frequently share metadata with PyPDF in the same documents. Each card links to its own forensic profile.
Long tail
Smaller cuts of the PyPDF corpus — useful context, but treat each row as a single data point rather than a strong signal.
Files containing JavaScript code
Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.