AI in Healthcare: Why HIPAA Wasn't Built for Large Language Models

HIPAA was written in 1996 for fax machines and filing cabinets. Thirty years later, healthcare organizations are feeding protected health information into AI systems that the law never anticipated. The regulatory gap is enormous -- and growing.

A radiologist at a mid-tier hospital system types a patient’s full medical history into ChatGPT to help draft a differential diagnosis. The prompt contains the patient’s name, date of birth, medical record number, a list of current medications, and a detailed description of imaging findings. This happens thousands of times per day across the United States. Under HIPAA, every one of these interactions is a potential violation carrying fines of up to $1.9 million per incident category per year. Under the current enforcement framework, almost none of them will ever be investigated.

HIPAA – the Health Insurance Portability and Accountability Act – was signed into law in 1996. The Privacy Rule was finalized in 2000. The Security Rule followed in 2003. These regulations were designed for an era of paper records, fax transmissions, and early electronic health record systems. They were built to govern data at rest and data in transit between known entities with defined relationships. They were not built for a world where a physician can transmit eighteen categories of protected health information to a server in San Francisco by typing a paragraph into a chatbot.

The structural mismatch between HIPAA and modern AI systems is not a matter of fine-tuning regulations. It is a foundational incompatibility.

The 18 Identifiers and the Prompt Problem

HIPAA’s Privacy Rule defines protected health information (PHI) through 18 specific identifiers: names, geographic data smaller than a state, dates (except year) related to an individual, phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number or code.

The de-identification standard under HIPAA provides two paths: Expert Determination (a qualified statistician certifies that re-identification risk is very small) and Safe Harbor (all 18 identifiers are removed). Both paths were designed for structured datasets – not for free-text clinical narratives being typed into AI prompts.

Here is the problem: clinical context is inherently identifying. A prompt that reads “65-year-old male with stage IIIB non-small cell lung cancer, EGFR mutation positive, previously treated with osimertinib, presenting with new brain metastases at a rural hospital in [state]” may contain zero of the 18 explicit identifiers and still be sufficient to identify the patient. In small populations, rare conditions function as quasi-identifiers. A 2024 study published in JAMA Network Open demonstrated that 87% of patients with rare diseases could be re-identified from clinical descriptions alone, even after Safe Harbor de-identification.

This is the gap that PII stripping technologies attempt to address – but the challenge with clinical language is that the medical content itself carries identifying potential that no simple pattern-matching system can neutralize.

Business Associate Agreements: The Legal Fiction

HIPAA requires covered entities (hospitals, insurers, providers) to execute Business Associate Agreements (BAAs) with any third party that handles PHI on their behalf. A BAA imposes specific obligations on the business associate: they must safeguard PHI, report breaches, limit uses and disclosures, and allow the covered entity to terminate the agreement if violations occur.

The question facing every health system in 2026 is straightforward: can you get a BAA from your AI provider?

OpenAI offers BAAs for Enterprise and API customers through their ChatGPT Enterprise product. The BAA covers data processed through the Enterprise tier. It does not cover data entered into the consumer ChatGPT product, ChatGPT Plus, or ChatGPT Team. According to their published documentation, OpenAI’s Enterprise BAA limits data use to providing the service and does not include training. But the scope of “providing the service” remains ambiguous – abuse monitoring, safety filtering, and system debugging all occur server-side and involve some level of data processing.

Google offers BAAs for Vertex AI and Google Cloud Healthcare API, but the terms for Gemini access through consumer products remain unclear. Google’s Health AI division has pursued HIPAA-compliant pathways, but the distinction between Google Cloud (BAA-eligible) and Google consumer products (not BAA-eligible) mirrors the OpenAI data practices split between consumer and API tiers.

Anthropic offers BAAs for API customers and has positioned Claude for Healthcare as a HIPAA-eligible deployment. Their privacy architecture separates conversation data from training pipelines, but the enforceability of this separation under HIPAA scrutiny remains untested.

Microsoft covers Azure OpenAI Service under its existing HIPAA BAA for Azure. This has made Azure OpenAI the default path for health systems that want GPT-4 class capabilities within a HIPAA-compliant wrapper. Microsoft reported in 2025 that over 600 healthcare organizations were using Azure OpenAI under BAA coverage.

The problem is not the existence of BAAs. The problem is that BAAs were designed for relationships where the covered entity knows exactly what data is being shared, with whom, and for what purpose. In the AI context, the data flowing through the system is unstructured, unpredictable, and generated in real-time by clinicians whose prompt hygiene varies wildly. A BAA with OpenAI means nothing if the radiologist down the hall is using the consumer free tier.

The Shadow AI Problem in Hospitals

The most significant HIPAA risk from AI does not come from authorized deployments. It comes from unauthorized use – what the industry calls “shadow AI.”

A 2025 survey by the American Medical Informatics Association found that 67% of physicians reported using consumer AI tools (ChatGPT, Gemini, Claude, Perplexity) for clinical tasks at least once per week. Of those, only 23% said their institution had provided guidance on AI use with patient data. Just 11% said they had been trained on de-identification practices before using AI tools.

The pattern is consistent across healthcare organizations. A clinician faces a complex case. They need a literature review, a differential diagnosis framework, or help drafting a patient communication. The institutional EHR system (Epic, Cerner/Oracle Health) may have embedded AI tools, but they are slower, less capable, or restricted to specific use cases. The clinician opens a browser tab and types the clinical scenario into ChatGPT. PHI leaves the institution’s control in under thirty seconds.

This is not a technology failure. It is a usability failure. And HIPAA’s enforcement framework – which relies on breach reports, complaints, and periodic audits – is structurally incapable of detecting it at scale. The Department of Health and Human Services Office for Civil Rights (OCR) received over 37,000 HIPAA complaints in 2024. It resolved fewer than 800 through investigation. Shadow AI use by individual clinicians is, for practical purposes, invisible to regulators.

The architecture described by zero-persistence systems offers one pathway out of this bind: if the AI processing layer retains nothing, the regulatory surface area contracts dramatically. But this requires the processing infrastructure itself to be redesigned, not just the policies governing its use.

Epic, Oracle Health, and the Institutional Response

The major EHR vendors have responded to the AI demand by building LLM capabilities directly into their platforms.

Epic Systems launched a suite of AI features integrated into its EHR, leveraging both Microsoft Azure OpenAI and its own in-house models. Epic’s approach routes all AI interactions through the EHR’s existing HIPAA infrastructure, meaning data never leaves the covered entity’s environment (or its BAA-covered cloud partner). By early 2026, Epic reported that over 300 health systems had activated at least one AI feature, with ambient clinical documentation – AI-powered note-taking from patient encounters – being the most widely adopted.

Oracle Health (formerly Cerner) has taken a similar path, embedding AI capabilities into its Oracle Clinical Digital Assistant and routing data through Oracle Cloud Infrastructure, which carries HIPAA BAA coverage. Oracle’s approach emphasizes structured data extraction over free-text generation, which reduces but does not eliminate PHI exposure.

Ambient clinical intelligence – the category that includes products from Nuance (Microsoft), Abridge, and Nabla – represents the highest-volume PHI data flow into AI systems. These tools record and transcribe entire patient encounters, process the audio through speech-to-text models, and then summarize the encounter into clinical notes. The volume of PHI processed per interaction is orders of magnitude larger than a typical text prompt. A 15-minute patient visit generates approximately 4,000-6,000 words of transcript, all of it saturated with PHI.

The institutional EHR path addresses the BAA problem but creates new ones. Clinicians report that EHR-embedded AI tools are less capable than frontier models available through consumer interfaces. This capability gap drives continued shadow AI use. The Samsung ChatGPT incident demonstrated this dynamic in the corporate sector; the healthcare parallel is playing out with even higher regulatory stakes.

De-Identification at Scale: Technical Limitations

HIPAA’s Safe Harbor method requires removal of all 18 identifiers. For structured data – database fields, HL7 messages, FHIR resources – this is technically feasible though operationally complex. For unstructured clinical text, it remains an unsolved problem.

Named Entity Recognition (NER) models trained on clinical text can identify and redact explicit identifiers with high accuracy. The 2014 i2b2/UTHealth de-identification challenge benchmark showed that top-performing systems achieved F1 scores above 0.95 for common identifier types (names, dates, locations). But these systems struggle with three categories:

Implicit identifiers: Clinical descriptions that identify patients through the uniqueness of their medical presentation rather than through explicit personal data. No NER model can determine that a specific combination of diagnosis, treatment history, and demographic features narrows the patient population to a single individual.

Contextual identifiers: Information that is identifying only in combination. A hospital name plus a department plus a date plus a rare procedure may uniquely identify both patient and provider. The identifying potential depends on external knowledge about hospital volumes and staffing patterns – information the de-identification system does not possess.

Temporal identifiers: HIPAA requires removal of dates more specific than year, but clinical reasoning often depends on temporal relationships. “Three weeks after starting treatment” is HIPAA-compliant. “October 15, 2025, three weeks after starting treatment on September 24” is not. Stripping dates while preserving clinically relevant temporal relationships requires sophisticated date-shifting that maintains intervals, and even this approach has been shown to fail when combined with public records of hospital admissions.

The Stealth Cloud approach to this problem operates at a different layer: rather than attempting perfect de-identification (which may be impossible for clinical text), it ensures that whatever data reaches the AI provider is processed without persistence, without logging, and without the possibility of retrospective access. The data exists in memory for the duration of inference and is then cryptographically shredded.

The Regulatory Gap: What OCR Has Not Addressed

The HHS Office for Civil Rights has issued remarkably little guidance on AI-specific HIPAA compliance. The most relevant document – a December 2023 bulletin on online tracking technologies – addressed the use of analytics and tracking pixels on hospital websites but did not specifically address AI chatbots or LLM services.

The absence of guidance creates a vacuum that health systems are filling with their own interpretations. Some have issued blanket prohibitions on AI tool use with patient data. Others have created “AI sandboxes” with institutional licenses to Enterprise-tier AI products under BAA coverage. A growing number have simply done nothing, relying on existing HIPAA training (which predates the AI era) and hoping that clinician behavior conforms to policies that don’t mention AI by name.

Congress has shown interest but limited action. The bipartisan AI LEAD in Health Act, introduced in 2025, would direct HHS to develop AI-specific HIPAA guidance. The bill has not passed. The FDA has cleared over 1,000 AI-enabled medical devices as of early 2026 but has jurisdiction only over devices, not over general-purpose AI tools used in clinical settings.

The European approach, governed by the GDPR framework and the EU AI Act (which classifies medical AI as “high-risk”), is substantially more prescriptive. But even GDPR’s stricter regime was not designed for the specific challenge of unstructured clinical text flowing into general-purpose AI systems.

What Compliance Actually Requires in 2026

For healthcare organizations operating under HIPAA today, the minimum defensible position for AI use involves:

Technical Controls

Institutional AI platforms with BAA coverage (Azure OpenAI, Epic-embedded AI, Google Cloud Vertex AI) as the approved pathway for any interaction involving PHI
Network-level blocking of consumer AI services on institutional networks and devices
Client-side PII stripping layers for any AI interaction that cannot be routed through BAA-covered infrastructure
Audit logging of AI tool access (who used which tool, when, but not the content of the interaction – logging the content would create a new PHI repository)

Administrative Controls

AI-specific amendments to HIPAA training programs, updated annually
Clear policies distinguishing authorized AI tools from prohibited consumer tools
Incident response procedures specific to AI-related PHI disclosures
Vendor assessment frameworks that evaluate AI providers on data handling, retention, and training practices – the kind of assessment covered in detail by the AI compliance checklist

Organizational Strategy

Designation of an AI governance committee with HIPAA compliance representation
Regular shadow AI audits (anonymized surveys, network traffic analysis)
Clinician feedback loops to ensure institutional AI tools are sufficiently capable to reduce the temptation toward consumer alternatives

The organizations that will navigate this transition most effectively are those that recognize the fundamental insight: HIPAA’s framework assumes that the covered entity knows where PHI is going and can contractually control what happens to it. AI disrupts both assumptions. Data goes where clinicians type it. And once it reaches a general-purpose AI system, the contractual controls governing its treatment are only as strong as the provider’s architecture and the regulator’s capacity to verify compliance. As the AI provider privacy scoreboard makes clear, that capacity varies enormously across providers.

The real question is not whether HIPAA will be updated for AI. It will. The question is how many PHI exposures will occur in the regulatory gap between the law we have and the law we need.

The Stealth Cloud Perspective

HIPAA’s structural incompatibility with AI is not a policy failure – it is an architecture failure. When the processing layer itself guarantees zero persistence and zero knowledge, the regulatory burden shifts from controlling data after it leaves your hands to ensuring it never persists in anyone else’s. Healthcare organizations should not have to choose between clinical AI capability and regulatory compliance; the infrastructure should make that choice unnecessary.