Does signing a BAA with OpenAI or Anthropic make PHI in prompts compliant?

A signed BAA permits PHI disclosure to the business associate under the conditions of the agreement. The disclosure becomes legally permitted, but the recording obligation persists. Under 45 CFR § 164.528, the covered entity must account for disclosures of PHI on request from the patient, including disclosures made to business associates. The recording requirement applies whether or not a BAA is in place. The BAA changes the legal basis for the disclosure; it does not eliminate the audit trail.

Can we just train workforce members to redact PHI manually before pasting into ChatGPT?

Workforce training reduces but does not eliminate the disclosure. Cloud Radix research found that 77% of employees using unauthorized AI admit to pasting sensitive business data into the model anyway. Healthcare-specific data shows 57% of professionals process PHI through unauthorized AI. Training as a sole control fails the HIPAA Security Rule's reasonable safeguards test. The OCR consistently treats training-only controls as inadequate when technical safeguards are available.

What about the Safe Harbor de-identification method?

Safe Harbor at 45 CFR § 164.514(b)(2) requires removal of all 18 listed identifiers plus a reasonable certainty that the residual data cannot identify an individual. Safe Harbor applies to the data set, not to a single prompt in isolation. A prompt that strips the 18 identifiers may still allow re-identification through the combination of remaining attributes, particularly if the prompt contains a rare diagnosis or a specific date pattern. Expert determination under 45 CFR § 164.514(b)(1) by a qualified statistician is the alternative method for non-Safe-Harbor cases.

Do free-tier AI services count as a HIPAA violation?

Workforce use of free-tier AI services to process PHI is a disclosure to a non-business-associate. The covered entity has no BAA with the consumer-tier endpoint. The disclosure violates 45 CFR § 164.502 unless one of the limited exceptions applies. OCR has not yet announced a major enforcement action specifically targeting consumer AI use, but the violation pattern is well-established. The standard expectation is that the covered entity prevents or records the disclosure through technical safeguards.

How does PHI redaction interact with the EU AI Act for healthcare deployers?

Healthcare deployments serving EU patients fall under both HIPAA-equivalent national law (GDPR plus national health data law) and the EU AI Act. If the deployment is high-risk under Annex III, Article 12 record-keeping applies in addition to HIPAA recording. A single per-request audit record that contains identity, classification, decision outcome, policy version, and timestamps satisfies both regimes. The architecture that addresses HIPAA also addresses Article 12.

HIPAA PHI Redaction in AI Prompts: What Inline Enforcement Requires

The HIPAA Privacy Rule restricts disclosure of Protected Health Information to entities that have signed a Business Associate Agreement with the covered entity. AI inference endpoints operated by OpenAI, Anthropic, Google, and Mistral are external services. A prompt containing a patient's name, diagnosis, or treatment plan that travels to those endpoints without a signed BAA is a disclosure of PHI to a non-business-associate. Cloud Radix research found that 57% of healthcare professionals use unauthorized AI to process PHI without a BAA in place.

I want to walk through what HIPAA actually requires at the prompt layer, where redaction has to happen for the architecture to be defensible, and what the evidence record looks like when HHS asks for it.

What HIPAA requires at the prompt layer

The HIPAA Privacy Rule at 45 CFR § 164.502 limits use and disclosure of PHI to permitted purposes. Disclosure to a third party requires either patient authorization, a permitted-purpose exception, or a Business Associate Agreement with that party. The Security Rule at 45 CFR § 164.312 requires technical safeguards including access control, audit controls, integrity, person or entity authentication, and transmission security.

A prompt is a disclosure when it leaves the covered entity's environment carrying PHI. The 18 HIPAA identifiers listed at 45 CFR § 164.514(b)(2) include names, geographic subdivisions smaller than a state, dates more specific than a year, telephone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate or license numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number or code.

Removing the 18 identifiers de-identifies the data under the Safe Harbor method at 45 CFR § 164.514(b)(2). Once de-identified, the data is no longer PHI and the disclosure restrictions do not apply.

Why application-layer redaction fails

Most healthcare deployments that have looked at the AI prompt problem rely on developers to strip PHI before calling the model. The redaction logic lives in the application that submits the prompt. Three failure modes appear in production.

First, the redaction logic depends on the developer correctly identifying every PHI field. A new feature that includes a free-text clinical note bypasses the structured-field redactor because the note is not enumerated. Second, the redaction logic depends on the developer not introducing a code path that submits raw prompts for debugging or evaluation. A debug toggle in staging accidentally promoted to production sends raw PHI to the model. Third, the redaction logic depends on the developer correctly handling response data, which sometimes contains PHI the model inferred from context.

Each failure mode is structural. The application that wants to send the prompt is the same component responsible for redacting the prompt. There is no independent check.

Where redaction has to happen

The architecture that survives HIPAA audit places redaction at a separate enforcement layer between the application and the LLM endpoint. The layer inspects every prompt regardless of which application produced it, applies a deterministic PHI classifier, and either redacts the identified fields, blocks the request, or allows it through depending on policy.

The classifier covers the 18 HIPAA identifiers, common clinical free-text patterns including dates of service, prescribed medications, dosages, and provider names, and configurable extensions for the covered entity's local data dictionary. The decision happens before the prompt reaches the model endpoint.

The enforcement layer records what it did. The record contains the identity that submitted the prompt, the original classification, the action taken (redact, block, allow), the policy version that governed the decision, and a tamper-evident hash of the prompt and response. The record is committed before the response returns to the application.

What the HIPAA audit evidence looks like

HHS Office for Civil Rights audits under the HIPAA Audit Program request documentation of technical safeguards in operation. For AI usage, the relevant request is "produce the records of PHI disclosures to AI services, the safeguards that applied to each disclosure, and the policy that authorized each disclosure."

A defensible response contains, per AI request: the timestamp, the workforce member or role identity, the patient identifier if disclosure was authorized, the PHI classification of the prompt, the redaction or block decision, the policy version, and a hash that allows post-hoc verification. The response covers months or years of activity depending on the audit scope.

The HHS audit program found in the 2016-17 round that 86% of audited covered entities failed to provide complete documentation of risk analysis and 67% failed to provide complete documentation of risk management procedures. The pattern continues into AI deployments. Covered entities that maintain policies without operational evidence are the typical finding population.

The Business Associate Agreement gap

A Business Associate Agreement transfers HIPAA obligations to the AI vendor. OpenAI, Anthropic, Microsoft Azure OpenAI, AWS Bedrock, and Google Vertex AI offer BAAs to enterprise customers. Each BAA covers a specific service tier, a specific data handling commitment, and a specific set of permitted uses.

The BAA does not absolve the covered entity of the disclosure-recording obligation. Even with a BAA in place, the covered entity must still record what PHI was disclosed, to which business associate, on whose behalf, and under what authorization. The BAA changes the disclosure from prohibited to permitted-with-recording. It does not eliminate the recording requirement.

Free-tier and consumer-tier AI endpoints, including ChatGPT free, Claude.ai free, and Gemini consumer, are typically excluded from BAA coverage. Workforce use of those endpoints to process PHI is a disclosure to a non-business-associate regardless of any other agreement the organization has with the vendor.

DeepInspect

This is the architecture DeepInspect was built to provide. DeepInspect sits at the AI request boundary as a stateless proxy between authenticated workforce members or agents and any LLM endpoint. PHI classification runs against every prompt. Redaction, blocking, or allow decisions are made based on per-route policies the covered entity controls. The decision happens before the prompt reaches the model.

Every decision produces a signed audit record containing identity, role, patient identifier where authorized, PHI classification, decision outcome, policy version, and timestamp. The record is committed before the response returns to the application. The record format satisfies the HHS audit evidence specification and the Article 19 retention requirement for healthcare deployments that also fall under the EU AI Act.

If you are running AI in a HIPAA-regulated environment and your PHI redaction depends on application code, that defense is fragile. Book a demo today.