Cloud Architecture for Medical AI: Compliance, Latency and Tradeoffs

Building cloud infrastructure for clinical AI is a different discipline from building cloud infrastructure for standard enterprise software. The compliance requirements are more stringent, the failure modes are more consequential, and the operational constraints — particularly around latency and audit trails — interact with architectural choices in ways that aren't obvious until you're deep in implementation.

This piece draws from direct experience building medical-grade data infrastructure and covers the requirements and tradeoffs that actually matter when you're building in this space.

The Regulatory Layer You Can't Ignore

If you're processing protected health information (PHI) in the United States, HIPAA governs your architecture. If you're handling data from EU residents, GDPR applies. If your AI tool meets the definition of a software as a medical device (SaMD) — which broadly means software that makes diagnostic or therapeutic claims — you're also looking at FDA 21 CFR Part 11 for electronic records, potential 510(k) clearance requirements, and IEC 62304 for medical device software lifecycle.

These aren't checkbox items that you address at the end of development. They shape fundamental architectural decisions. HIPAA's Security Rule requires audit controls — logging all access to PHI, with tamper-evident storage. This means your logging infrastructure is compliance-critical, not just operational. Encryption at rest and in transit is required. Business associate agreements (BAAs) must exist with every cloud provider and third-party service that touches PHI. AWS, Azure, and GCP all offer HIPAA-eligible services and will sign BAAs, but the BAA covers the infrastructure, not your implementation — you're still responsible for configuring services securely.

Data Residency and Sovereignty

Many healthcare organisations have data residency requirements — clinical data cannot leave the country or region. This constrains your multi-region architecture in ways that affect disaster recovery design. If UK patient data can't be replicated to US regions, your primary-secondary failover architecture needs to be designed accordingly, using only regions within the permitted geography. Azure's UK-specific regions and AWS's dedicated EU footprint address most cases, but you need to verify every data flow, including logging, monitoring, and backup pipelines.

Data Interoperability: HL7 and FHIR

Clinical data lives in electronic health record systems (Epic, Cerner, Meditech) that exchange data using healthcare-specific standards. HL7 version 2 (HL7 v2) is a messaging standard from the 1980s that remains the dominant format in most hospital systems for operational messages — orders, results, ADT (admission/discharge/transfer) events. It's text-based, delimited by pipe characters, and genuinely archaic by modern API standards.

FHIR (Fast Healthcare Interoperability Resources), version R4, is the modern standard promoted by ONC and CMS for structured data exchange. FHIR represents clinical data as JSON or XML resources (Patient, Observation, Condition, MedicationRequest) with RESTful APIs. Major EHR vendors have published FHIR APIs under the CMS interoperability rule.

Your ingest pipeline needs to handle both. In practice, real-time clinical event streams (lab results arriving as the analyser completes, vitals streaming from monitoring equipment) come via HL7 v2 or proprietary feeds. Structured patient history, problem lists, and medication records are increasingly available via FHIR. An ETL layer that normalises both into a common internal representation — typically FHIR R4 internally — is standard architecture for clinical AI infrastructure.

Inference Latency Requirements

Latency requirements vary dramatically by clinical use case and need to drive your serving architecture decisions explicitly.

Synchronous Clinical Decision Support

If your AI tool is integrated into a clinician's workflow — flagging abnormal findings during an active read session, alerting during medication ordering — the round-trip latency budget is tight. Radiologists reading studies expect near-instant feedback; anything over 2–3 seconds feels slow. For integrated decision support within EHR ordering workflows, sub-second is preferable. This means GPU-backed inference servers, model optimisation (quantisation, ONNX export, TensorRT compilation), and serving infrastructure close to the clinical site — either same-region cloud or on-premise edge compute.

Asynchronous Batch Processing

Many clinical AI applications don't need real-time inference. Overnight analysis of the prior day's imaging, population health risk scoring, longitudinal monitoring reports — these are batch workloads where you have hours, not milliseconds. Batch pipelines can use spot/preemptible instances for cost efficiency, scale to zero when not processing, and prioritise cost over latency. The architectural pattern is completely different: message queues, job schedulers, and autoscaling compute pools rather than always-on inference servers.

The mistake is applying real-time infrastructure to batch workloads or vice versa. Both are expensive in different ways. Map your use cases to their actual latency requirements before choosing an architecture.

Model Versioning in Clinical Settings

Clinical AI models must be version-controlled with the same rigour as medical device hardware. A model update that changes sensitivity or specificity is, in regulatory terms, a change to the device's intended use. Every deployed model version must be tracked, and the data and code used to produce it must be reproducible. MLflow, DVC, or similar tools integrated with your CI/CD pipeline are not optional — they're compliance infrastructure.

Audit Trail Architecture

HIPAA requires audit logs for all access to PHI. For a clinical AI system, this extends to: every inference request and its inputs, every model prediction and its confidence score, every clinician interaction with an AI-generated alert, and every data access query. These logs must be tamper-evident, retained for at least six years under HIPAA, and queryable for compliance investigations.

The volume is substantial. A radiology AI system processing 500 studies per day, logging each with input metadata, model version, prediction scores, and user actions, generates significant log data. AWS CloudTrail plus S3 with Object Lock (WORM storage) is a common pattern. Azure Immutable Storage achieves the same. The cost of tamper-evident long-term log retention is not zero and needs to be in your architecture budget from the start.

The Tradeoffs in Practice

Several tensions in clinical AI infrastructure don't have clean solutions and require explicit decisions:

Centralisation vs. data locality: Centralising training data from multiple hospital sites gives you larger, more diverse datasets and better models. Moving that data faces consent, data sharing agreement, and regulatory barriers that can be insurmountable. Federated learning — training on data without centralising it — addresses this in principle but is operationally complex and hasn't yet matched centralised training performance at scale in most clinical domains.

Model accuracy vs. explainability: High-accuracy models (large transformers, ensemble methods) are often less interpretable than simpler models. In clinical settings, explainability isn't just nice to have — it's required for clinician acceptance and may be required for regulatory approval. Attention visualisations and gradient-based saliency maps are imperfect but often sufficient for imaging applications. For tabular clinical data, SHAP values are widely used and FDA-accepted in several clearances.

Cost vs. reliability: Multi-region, multi-AZ deployment with zero-RPO failover is expensive. Hospital IT budgets are not infinite. The right answer is a formal risk assessment: what's the clinical impact of a system being unavailable for four hours? For continuous ICU monitoring, the impact is serious. For next-day reporting of routine scans, it may be acceptable. Architect to the clinical risk level, not to a default "maximum reliability" setting.

At Neurivvy Intelligenx, our cloud practice specifically addresses this combination of compliance, latency, and clinical workflow constraints. We've seen the consequences of retrofitting HIPAA controls onto architectures designed without them, and the pattern is consistent — it's more expensive, more time-consuming, and produces worse outcomes than building compliance in from day one.

← Back to Blog

Designing Cloud Infrastructure for Medical AI: Compliance, Latency, and the Tradeoffs Nobody Warns You About