Microsoft Word appears on both legitimate first-generation output and downstream re-save flows — context (the other tool on the same document) is what flips the signal.
Back to all statisticsForensic verdictBased on 2,138 appearances across the HTPBE? corpus.
Corpus profile
Microsoft Word is the dominant authoring application for office documents. Its native "Save as PDF" path identifies as Microsoft Word in both Creator and Producer, or pairs with Microsoft Print to PDF as the producer on Windows.
Word-as-Creator is one of the most common origins in the HTPBE? corpus and overwhelmingly indicates a single-author workflow. It only becomes interesting when the Producer on the same document is an unrelated re-saver — that producer/creator mismatch is what we flag, not Word itself.
Role in the workflow
Every PDF carries a Creator (the application that produced the original document) and a Producer (the engine that wrote the PDF). The same tool can appear in either slot, with very different modification profiles.
Name fingerprints
Different version strings and spellings observed for Microsoft Word in the wild. All are merged into the same canonical profile.
Why variants matter
The same tool publishes itself under 19 different metadata strings — version bumps, locale tags, build IDs. We canonicalize them so the corpus reflects one identity, not noise.
Distributions
The PDF versions Microsoft Word writes when acting as Producer, and the other tools that appear in the same documents.
Most output is PDF 1.7 (73% of files where Microsoft Word is the Producer).
Related profiles
Other tools that frequently share metadata with Microsoft Word in the same documents. Each card links to its own forensic profile.
Long tail
Smaller cuts of the Microsoft Word corpus — useful context, but treat each row as a single data point rather than a strong signal.
PDFs carrying at least one digital signature
Create your account — API key on signup, free test environment on every plan.
From $15/mo. No sales call. Cancel any time.