A practical, honest breakdown of what artificial intelligence can and cannot do inside a medical practice — and why off-the-shelf tools are not the answer.
The AI tools that most people interact with — ChatGPT, Google Gemini, Microsoft Copilot — are consumer products. They are not designed for regulated industries, and using them with patient data is a HIPAA violation.
This is the most important thing to understand before exploring AI for a medical practice: there is no off-the-shelf AI product you can subscribe to and start feeding patient information. The free and consumer-tier versions of every major AI tool explicitly state in their terms of service that they are not HIPAA-compliant, do not sign Business Associate Agreements, and may use your inputs to train their models.
This creates a real problem for healthcare, because AI genuinely has enormous potential to reduce administrative burden, improve documentation speed, and free up clinical staff for patient care. But unlocking that potential requires a purpose-built system — one that is architecturally designed from the ground up to keep patient data inside compliant boundaries.
Custom AI for healthcare does not mean building a language model from scratch. That would cost millions and take years. Instead, it means building a secure application layer on top of existing compliant AI infrastructure. Think of it this way:
This is fundamentally different from signing up for a SaaS product. It requires engineering, security review, and ongoing maintenance. But it is also the only way to use AI meaningfully in a regulated medical environment without exposing the practice to legal and financial risk.
AI compliance in healthcare is not about the model itself — it is about the infrastructure surrounding the model. The same GPT-4 model that powers the free ChatGPT chatbot can be used compliantly through Microsoft Azure with a BAA. The model is identical. The infrastructure and legal agreements are what make the difference.
Beyond the obvious HIPAA violations, consumer AI tools are inadequate for healthcare for several practical reasons:
Before any technology discussion, the single most important concept to understand is the Business Associate Agreement — the legal instrument that makes third-party AI processing possible under HIPAA.
Under HIPAA, a Business Associate is any entity that creates, receives, maintains, or transmits Protected Health Information on behalf of a covered entity (your practice). When you send patient data to an AI model hosted by Microsoft, Amazon, or Google, that cloud provider becomes your business associate.
A BAA is a legal contract between your practice and the vendor that specifies:
If an AI vendor will not sign a BAA, you cannot send them PHI. Period. It does not matter how good their security is, how popular their product is, or what their marketing says about privacy. Without a signed BAA, sending patient data to that vendor is a HIPAA violation with potential penalties ranging from $100 to $50,000 per incident, up to $1.5 million annually per violation category.
The good news is that every major cloud AI provider now offers BAAs for their enterprise and API tiers. The landscape has matured significantly in the past two years. The challenge is not finding a compliant provider — it is building the application correctly on top of that provider.
These are the AI model providers that currently offer BAAs and have infrastructure designed for regulated workloads. The models themselves are state-of-the-art — the same technology powering consumer tools, but accessed through compliant channels.
The most mature option for healthcare AI. Azure's cloud platform has been HIPAA-compliant for years and is already used by major health systems. OpenAI's models are accessed through Azure's infrastructure, meaning data never touches OpenAI's consumer servers. Microsoft provides extensive compliance documentation, SOC 2 Type II certification, and HITRUST CSF certification.
Azure also offers Azure AI Services (formerly Cognitive Services) for speech-to-text, text analytics, and document processing — all under the same BAA umbrella.
AWS has the longest track record in healthcare cloud compliance. Amazon Bedrock provides access to multiple AI model families — including Anthropic's Claude, Meta's Llama, and Amazon's own Titan — all within AWS's HIPAA-eligible infrastructure. The BAA covers all Bedrock services.
A significant advantage of Bedrock is model choice: if one model performs better for clinical documentation while another excels at administrative tasks, you can use both under the same BAA and compliance framework.
Google Cloud is HIPAA-compliant and signs BAAs for its Cloud Platform services, including Vertex AI. Notable for healthcare specifically: Google developed MedPaLM, a language model specifically trained for medical question-answering that has demonstrated expert-level performance on medical licensing exam questions.
Google also offers Healthcare Natural Language API and other specialized health data tools within the compliant Vertex AI ecosystem.
Anthropic offers BAAs for their direct API access. Claude models are known for strong instruction-following, nuanced text generation, and a large context window (up to 200K tokens), making them well-suited for processing lengthy medical records, insurance documents, and clinical protocols in a single request.
Anthropic explicitly states that API data is not used for model training when a BAA is in place.
There is a fifth option that avoids third-party BAAs entirely: running open-source models on infrastructure you control.
Models: Llama 3 (Meta), Mistral, Mixtral, Phi-3 (Microsoft), Qwen
Because the data never leaves your controlled environment, there is no third-party business associate involved. The practice (or its IT partner) is responsible for all security controls, encryption, access management, and audit logging.
The tradeoff is significant: self-hosted models require substantial GPU hardware (or cloud GPU instances), ongoing maintenance, and the models are generally less capable than the frontier commercial models listed above. However, for practices with extremely sensitive specialties or those who want zero external data exposure, this is a viable path — particularly for simpler tasks like document categorization, template generation, and internal search.
Understanding the technical architecture helps clarify why custom development is necessary and where the compliance requirements live in the system.
Every component in this flow is a compliance decision. The custom application layer is where the engineering work lives — it is the "wrapper" around the AI model that enforces every HIPAA requirement:
These are areas where AI delivers clear value, compliance is straightforward to maintain, and the risk profile is low. These should be the starting point for any medical practice exploring AI.
AI can draft prior authorization letters, appeal letters for insurance denials, referral summaries, and other administrative correspondence. The physician or staff member provides the key clinical details, and the AI generates properly formatted, complete documentation.
Why it's low risk: Staff reviews every output before it is sent. The AI is a drafting assistant, not a decision-maker. With de-identification techniques, the AI may never even see the patient's name.
Estimated time savings: 30-60 minutes per provider per day on paperwork.
After a patient encounter, AI can help structure SOAP notes from dictated or typed input, clean up free-text notes into standardized formats, and ensure documentation completeness. The clinician always reviews and signs off before the note becomes part of the medical record.
Why it's low risk: The physician is the final author of the note. AI is accelerating the documentation process, not replacing clinical judgment. This is analogous to using a medical scribe — but faster, more consistent, and available around the clock.
Estimated time savings: 1-2 hours per provider per day.
AI can be trained on your practice's own protocols, formulary, drug interaction guidelines, referral procedures, and staff handbooks. Staff can ask natural-language questions ("What is our protocol for patients presenting with chest pain and a history of atrial fibrillation?") and get immediate, sourced answers instead of searching through binders or SharePoint.
Why it's low risk: The knowledge base contains institutional knowledge, not individual patient data. No PHI is involved in the query or the response. This is essentially a smarter, conversational search engine for your own policies.
AI can review clinical documentation against CPT and ICD-10 codes to identify potential under-coding, documentation gaps that could lead to claim denials, or inconsistencies between diagnosis and procedure codes. This does not replace certified coders — it augments them by catching things that manual review might miss.
Why it's low risk: The AI is reviewing documentation that already exists in the system. With proper access controls, it can operate within the same data boundaries as your existing billing staff. All suggestions are reviewed by a certified coder before submission.
Revenue impact: Practices commonly find 5-15% of claims are under-coded.
AI can generate appointment reminder language, post-visit care instruction templates, pre-procedure preparation guides, and general health education content. These are templates — a human reviews and approves before anything reaches a patient.
Why it's low risk: Template generation can be done entirely without PHI. The AI creates the structure and language; staff personalizes it for specific patients through the existing patient communication system (portal, secure messaging).
AI can create interactive training modules, quiz staff on compliance procedures, and serve as a 24/7 resource for questions about practice operations, EHR workflows, or clinical protocols. New staff can ramp up faster by conversing with an AI that knows your practice's specific procedures.
Why it's low risk: Training content is institutional, not patient-specific. No PHI exposure. The AI draws from your policy documents, training manuals, and operational procedures.
These applications are achievable under HIPAA but require specific architectural decisions, additional security layers, and clear policies about human oversight. The compliance cost is higher, and the engineering must be precise.
AI-powered ambient listening can transcribe and summarize patient encounters in real time. Products like Nuance DAX and Abridge have proven the model works. A custom build can achieve similar results with more control over data handling and lower per-encounter costs at scale.
Audio recordings of patient encounters are PHI. The audio must be captured, transmitted, processed, and (potentially) stored entirely within HIPAA-compliant infrastructure. This means:
Process audio in real-time streaming mode and do not store raw audio recordings long-term. Once the transcription is generated and the clinician approves the note, the raw audio can be discarded. This minimizes the PHI footprint while still delivering the clinical value. Additionally, use specialized medical speech-to-text models (AWS Transcribe Medical is specifically tuned for clinical terminology) to improve accuracy.
An AI chatbot on your practice's website or patient portal could handle appointment scheduling, answer frequently asked questions, provide pre-visit instructions, and triage basic inquiries. However, the moment a patient types their symptoms, medications, or any identifying information into that chatbot, the conversation becomes PHI.
You cannot predict or control what a patient will type. Even a chatbot designed for scheduling might receive messages like "I need to reschedule my appointment because my cancer treatment side effects are worse" — and that message is now PHI that must be protected.
Approach A — Non-PHI chatbot: Design the chatbot to handle only general, non-clinical interactions. Office hours, directions, accepted insurance plans, general preparation guides. Include clear disclaimers that the chatbot cannot provide medical advice and should not be used to share health information. This approach requires no BAA for the chatbot itself, but requires careful prompt engineering to redirect clinical questions appropriately.
Approach B — Full PHI-capable chatbot: Build the chatbot on HIPAA-compliant infrastructure with full encryption, access controls, audit logging, and patient authentication. This is a significantly larger build, but enables capabilities like symptom pre-screening, medication refill requests, and secure communication. This approach requires the chatbot platform to be covered under a BAA and integrated with your patient authentication system.
AI can read through years of chart notes, lab results, imaging reports, and specialist consultations to generate a concise clinical summary for a provider before a patient encounter. For complex patients with extensive histories, this can save 15-30 minutes of chart review per visit.
This requires the AI to ingest the most sensitive data in your practice — complete patient records. The data must be transmitted to the model within compliant infrastructure, and the model's context window must be large enough to process lengthy records (this is where Claude's 200K token context window is particularly valuable).
Implement a de-identification pipeline: before sending records to the AI model, automatically strip direct identifiers (name, date of birth, SSN, MRN) and replace them with tokens. The AI processes the clinical content without knowing which patient it belongs to. The application layer re-associates the AI's output with the correct patient record. This provides defense-in-depth — even if the BAA-covered model experienced a theoretical breach, the data would be de-identified.
The highest-value AI applications in a medical practice require reading from and writing back to your Electronic Health Record system. Pulling patient history, inserting drafted notes, updating problem lists, and triggering workflow actions all require API-level integration with your EHR.
EHR APIs handle PHI at every step. The integration must use secure, authenticated API connections (typically FHIR R4 or proprietary APIs). Your EHR vendor must support and approve the integration. Most modern EHRs (Epic, Athena, eClinicalWorks, Cerner/Oracle Health) have APIs available, but the approval process, testing, and credentialing can take weeks to months.
Start with read-only integrations before attempting write-back. An AI system that can pull data from the EHR to generate summaries or suggestions — but requires a human to manually enter approved content back into the EHR — eliminates the risk of AI-generated errors being automatically committed to the medical record. Write-back integrations can be added later once confidence in the system is established.
These areas carry significant legal, regulatory, or clinical risk. Some are outright inadvisable. Others may be possible in the future but require regulatory frameworks that do not yet fully exist.
AI should never independently diagnose conditions, prescribe medications, recommend treatment plans, or make clinical decisions without physician oversight and final approval.
This is not just a HIPAA issue — it enters FDA regulatory territory. Software that is intended to diagnose or treat medical conditions may be classified as a Software as a Medical Device (SaMD) under FDA regulations, requiring clearance or approval before it can be used clinically. The liability implications are also profound: if an AI makes an incorrect recommendation that leads to patient harm, the question of who is legally responsible — the physician, the practice, the AI developer, the model provider — is largely untested in court.
The boundary: AI can surface information, highlight patterns, and generate draft recommendations for a physician to review. It cannot be the decision-maker. A human clinician must always be the final authority on clinical decisions, and that human review must be documented.
Fine-tuning an AI model on your patient data can improve performance for your specific clinical context. However, this creates significant risk:
The safer alternative: Use retrieval-augmented generation (RAG) instead of fine-tuning. RAG keeps your data in a secure database and feeds relevant snippets to the model at query time. The model never "learns" from your data — it references it. This is easier to audit, easier to delete (you can remove records from the database), and avoids the model memorization problem entirely.
Psychotherapy notes and substance abuse treatment records carry additional federal protections beyond standard HIPAA requirements. Psychotherapy notes have specific exclusions under the HIPAA Privacy Rule, and substance abuse records are protected under 42 CFR Part 2, which imposes stricter consent requirements than HIPAA.
AI processing of these record categories requires patient-specific written consent that explicitly covers AI processing — not just a general consent to treatment. The consent must be granular, revocable, and documented. Many practices find that the consent burden alone makes AI processing of these categories impractical.
Recommendation: Exclude these record categories from AI processing entirely in the initial implementation. They can be evaluated separately with appropriate legal counsel and updated consent processes.
If an AI system generates a message containing a patient's diagnosis, test results, or treatment details and sends it through an unsecured channel (standard email, SMS, or unencrypted messaging), that is a HIPAA violation regardless of how the message was generated.
The boundary: AI can draft communications, but those communications must be delivered through secure, HIPAA-compliant channels — a patient portal with authentication, encrypted email services, or secure messaging platforms. The AI should never have the ability to autonomously send PHI-containing messages to patients without human review and compliant delivery channels.
This is the most common and most preventable violation. Staff members using ChatGPT, Google Gemini, or other consumer AI tools to "help" with documentation, letters, or coding by pasting in patient information. Each instance is a potential HIPAA breach.
The solution: Provide staff with a compliant AI alternative (the custom-built system) that is easier and more useful than consumer tools, and implement clear policies prohibiting the use of non-approved AI tools for any work involving patient information. The best prevention is making the compliant option the most convenient option.
When use cases require AI processing of PHI, these architectural patterns add layers of protection beyond the BAA itself.
The most powerful compliance tool available. Before sending data to the AI model, an automated pipeline strips all 18 HIPAA identifiers (name, address, dates, phone numbers, email, SSN, MRN, etc.) and replaces them with tokens. The AI processes the clinical content without knowing which patient it belongs to. After processing, the application layer re-maps the tokens to the original identifiers.
This approach provides defense-in-depth: even if the BAA-covered model experienced a breach, the exposed data would be de-identified and clinically meaningless without the token mapping, which is stored separately in your controlled infrastructure.
Rather than training the AI model on your data (which embeds PHI in the model's weights), RAG keeps your data in a secure, encrypted vector database. When a user asks a question, the system retrieves only the relevant data snippets and provides them to the model as context for that single request. The model generates a response using that context but does not retain it.
Benefits of RAG for healthcare:
For the most sensitive use cases, the AI processing can be isolated in a dedicated virtual private cloud (VPC) that has no internet egress. Data enters through a single authenticated endpoint, is processed, and the results are returned through that same endpoint. No data can leave the environment through any other path. This is the same architecture used by government agencies for classified workloads, adapted for healthcare compliance.
The AI application enforces the HIPAA "minimum necessary" standard by design. A billing staff member querying the system sees only financial and coding data. A nurse sees clinical data relevant to their care coordination role. A physician has broader access within the scope of their patient panel. The AI model receives only the data that the requesting user is authorized to see — not the entire record.
Transparency about cost is important. A compliant AI system for healthcare is not a weekend project. Here is an honest assessment of the investment involved.
A chatbot or automation tool for a non-regulated business might take 2-4 weeks to build. The same functional tool for a medical practice can take 6-12 weeks because of the compliance infrastructure that must surround it:
Despite the higher upfront cost, the economics of custom healthcare AI are compelling because medical practices have uniquely high labor costs for administrative tasks:
The compliance infrastructure built for the first use case serves as the foundation for every subsequent use case. The BAA, the encryption layer, the audit system, the access controls — once built, they don't need to be rebuilt. Each additional AI capability added to the platform has a lower marginal cost than the first. This is another reason why starting with a well-architected custom build is preferable to cobbling together point solutions from multiple vendors.
For any medical practice exploring AI for the first time, the recommended approach is to start with the highest-value, lowest-risk use cases and build on a foundation that can scale.
Stand up the compliant infrastructure. Execute BAA with cloud provider. Build the core application with authentication, audit logging, and encryption. Deploy initial use cases: administrative document drafting, internal knowledge base for protocols and procedures, and communication template generation. These use cases deliver immediate value to staff while the compliance foundation is validated.
Add clinical documentation support — note structuring, dictation cleanup, SOAP note generation. Integrate coding optimization to review documentation against billing codes. These use cases directly impact provider efficiency and practice revenue. The compliance infrastructure built in Phase 1 supports these use cases with minimal additional work.
Begin EHR integration (read-only initially). Implement patient record summarization with de-identification pipelines. Add voice transcription if ambient documentation is a priority. These use cases require more engineering but deliver the highest per-encounter value.
Patient-facing chatbots, write-back EHR integration, advanced clinical decision support tools, and specialty-specific capabilities. Each is built on the compliance foundation established in Phase 1 and validated through Phases 2 and 3.
The key takeaway from this entire guide is this: AI in healthcare is not a technology problem — it is a compliance and architecture problem. The AI models themselves are ready. The cloud infrastructure is ready. The missing piece for most practices is the custom application layer that connects these capabilities to their specific workflows while maintaining the regulatory requirements that protect their patients and their practice.
That application layer is what we build.