Scanner Xerox appears on both legitimate first-generation output and downstream re-save flows — context (the other tool on the same document) is what flips the signal.
Back to all statisticsForensic verdictBased on 267 appearances across the HTPBE? corpus.
Corpus profile
Scanner Xerox is one of the PDF-handling tools surfaced in the HTPBE? corpus. Scanner Xerox splits its occurrences between Creator (51%) and Producer (49%) roles, meaning it sometimes originates documents and sometimes re-emits them after another tool created them.
In the HTPBE? corpus the contextual signal we look for is a producer/creator mismatch: when Scanner Xerox appears as the latest Producer on a document whose Creator was an institutional source (e.g. Adobe PDF Library, Microsoft Word, a banking back-end), the document was rebuilt or re-saved after its original creation. That mismatch is the marker — never the tool itself.
On documents where Scanner Xerox acts as Creator, 4% carry modification markers; on documents where it acts as Producer, 1% do. These are observed rates inside the HTPBE? corpus and should be read as base-rates, not as accusations against Scanner Xerox or its users.
Role in the workflow
Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). The same tool can appear in either slot, with very different modification profiles.
Name fingerprints
Different version strings and spellings observed for Scanner Xerox in the wild. All are merged into the same canonical profile.
Why variants matter
The same tool publishes itself under 4 different metadata strings — version bumps, locale tags, build IDs. We canonicalize them so the corpus reflects one identity, not noise.
Distributions
The PDF versions Scanner Xerox writes when acting as Producer, and the other tools that appear in the same documents.
Most output is PDF 1.4 (96% of files where Scanner Xerox is the Producer).
Related profiles
Other tools that frequently share metadata with Scanner Xerox in the same documents. Each card links to its own forensic profile.
Long tail
Smaller cuts of the Scanner Xerox corpus — useful context, but treat each row as a single data point rather than a strong signal.
Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.