Optum
Production ML for Healthcare at UnitedHealth Group
Role
Software Engineer, AI/ML
Duration
Feb 2026 – Present
Team
Enterprise AI/ML Team
Status
Current
Overview
Most ML in healthcare dies in a notebook. A model performs well on historical data, gets handed to engineering, and never survives contact with production traffic, schema drift, or a compliance audit. I work on the systems that keep models alive after deployment — pipelines ingesting millions of records daily, monitoring that catches drift before it reaches patients, infrastructure that makes all of it auditable under HIPAA. The challenge is not building the model. It is building the system around it that runs at 150M-patient scale.
Problem
A care gap detected six months late is a care gap that sent someone to the ER. Clinical records, claims, pharmacy data, and lab results live in separate systems with different schemas, update cadences, and access controls. Models trained on clean historical data encounter missing fields, delayed records, and format changes in production. The failure mode is not a bad prediction — it is a prediction that looked correct at training time and degrades silently in production until a downstream clinician makes a decision on stale confidence scores.
Approach
- 01Own the full lifecycle from data ingestion through feature engineering, training, deployment, and post-deployment monitoring — no handoff gap between research and production
- 02Deploy on AWS with Kubernetes orchestration and automated rollback triggered by performance degradation, not just infrastructure failure
- 03Build processing jobs against strict SLAs — healthcare records that arrive late or process late compound downstream into missed care windows
- 04Instrument every model with drift detection, prediction confidence tracking, and anomaly alerting. If a model's output distribution shifts, we know before a clinician sees the result
- 05Translate research models into production services with latency and throughput guarantees. A model that takes 30 seconds per prediction is a model that will not be used
- 06Enforce HIPAA compliance as infrastructure, not policy — encryption at rest and in transit, row-level access controls, immutable audit trails. Compliance is not a checklist item, it is an architectural constraint
Design Decisions
Technology Stack
Languages
ML/AI
Infrastructure
Data
Compliance
Impact
Scale
150M+
Patients served across UnitedHealth Group — every pipeline decision compounds at this scale
Records
Millions/day
Healthcare records processed daily with strict SLA guarantees
Uptime
99.9%+
Production model availability with automated rollback on degradation
Next Case Study
MedVanta Platform↗