Tool profile

PyPDF

PyPDF appears on both legitimate first-generation output and downstream re-save flows — context (the other tool on the same document) is what flips the signal.

Back to all statistics

Forensic verdict

Mixed signal

Based on this tool’s share of the HTPBE? corpus.

Modification rate

5%-43pp below baseline

Corpus baseline: 48%

Corpus share

0.38%

Share of all analyzed appearances

Modification rate

-43pp below baseline

Role split

0%C/100%P

Creator vs Producer share of appearances

Corpus profile

How PyPDF shows up in HTPBE? corpus

PyPDF is a Python PDF library (pypdf / PyPDF2 family) widely used in scripting and back-office automation to merge, split, encrypt, and extract pages from existing PDFs.

PyPDF rarely creates documents from scratch — it almost always re-emits an existing PDF after a programmatic operation. When PyPDF is the latest Producer on a document whose Creator was institutional, the producer/creator mismatch indicates programmatic post-processing.

Role in the workflow

How PyPDF shows up in metadata

Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). The same tool can appear in either slot, with very different modification profiles.

CAs Creator · 0%

As Producer · 100%P

CAs Creator

Share of appearances
0%
Modification rate
0%

PAs Producer

Share of appearances
100%
Modification rate
5%
Avg file size
805 KB

Name fingerprints

Also goes by

Different version strings and spellings observed for PyPDF in the wild. All are merged into the same canonical profile.

PyPDF287.1%

pypdf11.8%

PyPDF31.2%

Why variants matter

The same tool publishes itself under 3 different metadata strings — version bumps, locale tags, build IDs. We canonicalize them so the corpus reflects one identity, not noise.

Most common

PyPDF2

87.1% of appearances

Variant spread

3 distinct strings

Long-tail share: 12.9%

Observed range

28.01.2026 → 16.06.2026

Distributions

What ships alongside PyPDF

The PDF versions PyPDF writes when acting as Producer, and the other tools that appear in the same documents.

PDF versions written

Most output is PDF 1.4 (95% of files where PyPDF is the Producer).

PDF 1.495.0%

PDF 1.75.0%

Common Creators when PyPDF is the Producer

Quadient sits upstream in 50% of cases — read this row as “what kinds of documents end up routed through PyPDF.”

Quadient50.0%

wkhtmltopdf50.0%

Related profiles

Tools you’ll see next to PyPDF

Other tools that frequently share metadata with PyPDF in the same documents. Each card links to its own forensic profile.