Healthcare Document Fraud: PDF Verification for Medical Credentials

Code examples verified against the API as of May 2026. If the API has changed since then, check the changelog.
In 2023, a nurse practitioner in Texas treated patients for seven months before the hospital discovered that the medical license PDF submitted during credentialing had been altered. The document was a real license — issued to a different person, with the name and license number changed in a consumer PDF editor. The file looked authentic on screen. It passed visual review by two credentialing specialists. The modification was discovered only after a routine primary source verification flagged a mismatch with the state licensing board’s records.
The hospital faced regulatory sanctions, malpractice exposure, and the cost of unwinding seven months of patient care decisions. The total financial impact exceeded $2 million before litigation costs.
This is not an edge case. The DOJ recovered over $5.7 billion in healthcare-related False Claims Act settlements in fiscal year 2025 alone — a record. A significant and growing share of healthcare fraud involves altered or fabricated PDF documents: medical licenses, lab results, prescriptions, insurance claim support documents, and continuing education certificates.
The detection gap is not that these alterations are invisible. It is that most healthcare organizations rely on visual review of documents that were designed to fool visual review. Understanding why PDF authenticity matters at the organizational level is the first step toward closing that gap.
Four Categories of Healthcare PDF Fraud
Healthcare document fraud spans the full operational lifecycle — from hiring and credentialing through clinical operations to claims and billing. The healthcare industry page covers common use cases; here we examine the specific document types and fraud patterns in detail.
Medical Licenses and Credentials
State medical boards, nursing boards, and specialty certification bodies issue credentials as PDFs generated by institutional document management systems. These files carry specific structural fingerprints: a Producer field identifying the government or institutional software, a Creator field tied to a system service account, and a single-revision xref table consistent with a document generated once and never reopened.
The fraud pattern: an applicant obtains a legitimate license PDF (their own expired license, or a colleague’s active license found online), opens it in a consumer editor, changes the name and license number, and submits it during credentialing. The pattern is similar to what we see in HR diploma fraud, but with higher stakes. The layout, seal, and formatting are authentic because the template is authentic. Only the identifying information changed — and with it, the file’s structural metadata.
Cisive reports that healthcare organizations routinely encounter forged credentials during the hiring process, and that the global rate of document forgery across industries stands at approximately 2.3%. In healthcare, the consequences of that 2.3% are measured in patient safety, not dollars.
Prescriptions
Prescription fraud is one of the most common forms of healthcare document manipulation. Legitimate prescriptions are generated by electronic prescribing systems (Surescripts, DrFirst, or EHR-integrated modules like Epic’s e-prescribing) and carry metadata consistent with institutional software.
A forged or altered prescription typically shows one of two patterns: either it was created from scratch in a consumer application (Word, Pages, Canva) with no institutional metadata at all, or it was a legitimate prescription that was opened in an editor to change the medication name, dosage, quantity, or refill count. Both patterns leave structural traces in the PDF that visual inspection cannot detect.
Lab Results and Diagnostic Reports
Laboratory information systems (LIS) — platforms like Sunquest, Cerner PathNet, or Epic Beaker — generate lab result PDFs through automated pipelines. These documents are produced programmatically, carry consistent Producer and Creator fields tied to the LIS platform, and have single-revision file structures.
Altered lab results appear in multiple contexts: patients modifying results before sharing them with insurance companies or employers, individuals changing diagnostic values to qualify for disability benefits, and in legal proceedings where lab results are submitted as evidence. The alteration pattern is consistent — the content changes while the layout stays identical, and the file’s binary structure records the intervention.
Insurance Claim Support Documents
Healthcare insurers process millions of supporting documents per year: treatment summaries, itemized bills, referral letters, and proof-of-service records. A 2026 Verisk study found that 99% of insurers have encountered manipulated or AI-altered documentation in claims submissions, with an estimated 25–30% of claims now involving digitally altered documents.
The inflation pattern is predictable: a $3,400 treatment summary becomes $8,400. Three physical therapy sessions become twelve. A legitimate document from a real provider, with specific dollar amounts or service counts changed. For a deeper analysis of how this pattern works in insurance specifically, see our article on insurance claims fraud and altered PDFs.
Why Visual Review Fails in Healthcare
Healthcare credentialing teams, compliance officers, and claims adjusters are trained professionals. They know what legitimate documents look like. The problem is that modern document fraud produces files that look exactly like legitimate documents because they start as legitimate documents.
A medical license PDF altered in a consumer editor retains the original layout, seal, signature image, and formatting. The state board’s logo is real because it was copied from a real license. The font is correct because it was preserved from the original file. The only changes are to text fields — a name, a license number, a date — and those changes are visually indistinguishable from the original.
This is why primary source verification (PSV) exists: organizations contact the issuing institution directly to confirm that the credential is valid. PSV is thorough but slow, typically taking 5–15 business days per credential, and it is expensive when applied to every document in a credentialing pipeline. Most healthcare organizations reserve full PSV for final candidates, not initial submissions.
The result is a gap between document submission and verification — a window during which forged credentials may be accepted provisionally. In fast-moving hiring environments, particularly during staffing shortages, that window can stretch into weeks or months.
What Forensic PDF Analysis Actually Detects
Forensic metadata analysis works at the file structure level, examining fields and binary patterns that are invisible when the document is opened in a PDF viewer. For healthcare documents, five signals are particularly diagnostic.
Creator and Producer Mismatch
A medical license generated by a state board’s credentialing system will have a Producer field identifying that system — something like “Oracle Document Cloud Service” or “PeopleSoft Enterprise.” If someone opens that file in LibreOffice, changes the name, and re-saves it, the Producer field changes to “LibreOffice” while the document still claims to be from a state licensing board.
No state medical board issues licenses through LibreOffice. That mismatch between the document’s claimed origin and its actual producer is a primary detection signal. For a complete reference of what each metadata field contains and how it is set, see the PDF metadata field reference.
Date Discrepancies
A medical license issued in 2022 should have a CreationDate in 2022. If the PDF’s metadata shows it was created last week, or if the ModDate diverges from the CreationDate by days or weeks, the file has been reopened and re-saved after its original creation. A credential document that was generated once and delivered to the licensee has no reason to carry a modification timestamp different from its creation timestamp.
Incremental Updates and XRef Structure
The PDF format does not overwrite content when a file is edited. Instead, it appends new objects and a new cross-reference (xref) table while preserving the original byte stream. A document generated once has a single xref entry. A document that has been opened and re-saved has two or more. For a detailed explanation of how incremental updates work at the binary level, see our technical deep dive.
A medical license with three xref revisions has been saved three separate times since its original creation. That is inconsistent with a document generated by an automated credentialing system and delivered without further editing.
Digital Signature Status
Some state boards and healthcare institutions digitally sign credential documents. If a signed document is subsequently modified, the signature verification status changes to “modified after signing.” This is one of the strongest fraud indicators available — it means someone altered the document after the issuing authority certified it.
HTPBE’s analysis detects not only whether a signature is present but whether the document was modified after signing, and whether a signature was removed entirely (another definitive fraud indicator).
Consumer Software Origin
When a document that should originate from institutional software instead shows a consumer application as its creator — Canva, Microsoft Word, Google Docs, Pages — the HTPBE verdict is inconclusive. This does not mean the analysis failed. It means the document was created with software that no institutional credentialing system uses, which is itself a meaningful signal.
A prescription “from” a hospital that was actually created in Google Docs warrants immediate follow-up, regardless of how professional it looks on screen.
A Concrete Example: Altered Medical License
Here is what the HTPBE API returns for a medical license PDF that has been modified after its original issuance:
{
"id": "d4e8f321-7a2b-4c91-b5d3-ef9876543210",
"status": "modified",
"creator": "Oracle Document Cloud",
"producer": "LibreOffice 7.6",
"creation_date": 1654099200,
"modification_date": 1714521600,
"file_size": 287340,
"xref_count": 2,
"pdf_version": "1.7",
"origin": { "type": "consumer", "software": "LibreOffice" },
"has_incremental_updates": true,
"has_digital_signature": false,
"signature_removed": false,
"modifications_after_signature": false,
"has_javascript": false,
"has_embedded_files": false,
"modification_markers": [
"Creator and Producer from different tool categories",
"Different creation and modification dates",
"Multiple cross-reference tables (incremental updates)"
]
}
Three independent signals converge: the Creator field says Oracle (institutional software) but the Producer field says LibreOffice (consumer software). The creation date is in 2022 but the modification date is in 2024. The xref table has two revisions, confirming the file was re-saved.
Each signal alone is worth investigating. Three signals pointing in the same direction — all consistent with someone opening an institutional document in a consumer editor and re-saving it — is a clear finding.
Where This Approach Has Limits
PDF metadata analysis is effective for detecting the most common forms of healthcare document fraud, but it cannot catch everything. Healthcare compliance teams should understand these boundaries.
Sophisticated forgeries built from scratch. A forger who understands PDF structure could theoretically create a document using the same software and metadata patterns as the real issuing institution. If they set the correct Producer string, use plausible timestamps, and avoid incremental updates, the file would pass structural analysis. This requires specific technical knowledge of how a given state board or hospital system generates its PDFs — knowledge that is not widely available but is not impossible to obtain.
Scanned documents. When a credential is scanned from a printed copy rather than submitted as the original digital PDF, the resulting file carries the scanner’s metadata, not the original issuer’s. A scanned document will typically return inconclusive because the scanner software (not institutional credentialing software) is the creator. This is not a false result — it accurately reflects that the digital file’s provenance cannot be verified through metadata — but it means the analysis cannot distinguish a scan of a legitimate document from a scan of a forged printout.
Content accuracy of intact files. A medical license that was legitimately generated by a state board but issued to a different person — and submitted without any PDF-level modification — will appear intact. The file structure is genuine. Metadata analysis does not read or verify the text content of the document; it analyzes the file’s structural properties. Content verification remains the domain of primary source verification.
Encrypted or password-protected PDFs. Some healthcare documents are delivered with access restrictions. If the file cannot be parsed, analysis cannot proceed. This is a documented limitation, not a silent failure — the API returns a clear error rather than a misleading result.
These limits are real. Forensic PDF analysis is most effective as a fast, automated first-pass filter that catches the majority of document fraud — the fraud committed with readily available consumer tools by people who do not understand PDF structure. For the remaining edge cases, primary source verification remains necessary.
Integrating PDF Verification into Healthcare Workflows
The practical integration points in healthcare operations map to specific document intake moments.
Credentialing and Hiring
When a provider or staff member submits credentials during onboarding, each PDF can be verified before it enters the credentialing file. A modified or inconclusive verdict triggers immediate PSV for that specific document, rather than waiting for the batch verification cycle. This reduces the window during which forged credentials are provisionally accepted.
Claims Processing
For healthcare payers, every supporting document attached to a claim — treatment summaries, itemized bills, referral letters — can be checked at intake. Documents returning modified are routed to the special investigations unit before the claim advances. At scale, this catches altered documents that individual adjusters reviewing hundreds of claims per week would not identify visually.
Pharmacy and Prescription Verification
Pharmacies receiving electronic or faxed prescriptions in PDF format can verify the file’s structural integrity before dispensing. A prescription whose metadata shows it was created in Microsoft Word rather than an e-prescribing system is not necessarily fraudulent, but it warrants a callback to the prescribing provider.
API Integration
The verification call is a single HTTP request:
curl -X POST https://api.htpbe.tech/v1/analyze \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-system.example.com/documents/medical-license-12345.pdf"
}'
The response returns in 2–5 seconds with a verdict (intact, modified, or inconclusive), the specific modification markers detected, and the full metadata profile. Your system parses the status field and routes the document accordingly.
For teams building integrations, test API keys are available on all plans — including free — and return deterministic mock responses without consuming production quota. See the API documentation and pricing for plan details and to generate your key.
Who Should Be Using This
Three roles in healthcare organizations benefit most from automated PDF verification:
Credentialing coordinators who process provider applications and need to flag suspicious credentials before they enter the verification pipeline. Catching a forged license at submission — rather than discovering it weeks later when PSV returns a mismatch — prevents provisional acceptance of unqualified providers.
Claims operations managers who oversee document intake for insurance claims and need a scalable way to identify altered supporting documents. Manual review of every claim attachment is not operationally feasible. Automated structural analysis at intake is.
Compliance officers responsible for regulatory audit readiness. Demonstrating that your organization runs automated document integrity checks on submitted credentials and claims documentation strengthens your compliance posture. In an enforcement environment where the DOJ recovered $5.7 billion in healthcare fraud settlements in a single fiscal year, the cost of not checking is harder to justify than the cost of checking.
The HTPBE Starter plan handles 30 verifications per month at $15 — sufficient for a small clinic’s credentialing workflow. The Growth plan at $149/month covers 350 checks, scaling to claims operations or multi-facility credentialing. For health systems processing thousands of documents monthly, enterprise plans with custom volumes are available.