COVID Reporting Portal

// A mini data lake for a 40-hospital system during a pandemic.

roleSystems architect, lead engineeroutcome1,000+ DAU across clinicians & ops. 40+ hospitals. Weeks, not quarters.stackPythonAirflowPostgresReactHL7

The problem

In March 2020 the system did not have a single place to answer the question "how many COVID patients are on a vent right now, system-wide." Every hospital had its own EHR install, its own admit/discharge feeds, its own slightly different definition of an ICU bed. Reporting up to ops, the state, and CMS was a nightly Excel exercise stitched together by hand from forty separate inputs. By the time the numbers were on a single page, they were a day stale.

Leadership needed something durable enough to use for the next two years and shippable in roughly two weeks. The brief was small in words and large in implication: one source of truth, refreshed fast, role-scoped so a unit charge nurse and a regional VP saw the same numbers but not the same screen.

What I built

I designed and led the ingestion → storage → dashboard stack end to end. Three layers, kept deliberately boring.

Ingestion. HL7 v2 feeds (ADT^A01/A02/A03, ORU^R01) from each member hospital landed on an MLLP listener that wrote raw segments to S3 first, then queued a parse. Doing the durable raw-write before the parse meant we could re-derive everything when a definition changed mid-pandemic — and it changed often. Airflow ran the normalization DAGs on a five-minute cadence. Hospitals that could not stand up an HL7 feed in time fell back to a SFTP drop of nightly extracts; same downstream shape, slower SLA.

Normalization. The hardest problem was not parsing HL7 — it was reconciling that "ICU" at hospital A was three different unit codes at hospital B and a free-text field at hospital C. We kept a versioned mapping table, owned by a small clinical informatics team, that translated local codes into a system-wide vocabulary. Every fact row in Postgres carried the mapping version that produced it, so when the definition of "COVID suspected" shifted we could replay history under the new rule without losing the old one.

Dashboards. A React front-end on a thin FastAPI layer. Three audiences, three views: a unit-level operational view (beds, vents, staffing flags), a hospital/regional roll-up, and a system-wide executive view that fed the state and CMS reports. Role scoping was enforced server-side off Active Directory groups; the front-end just rendered what came back. Numbers refreshed every five minutes, with a visible "as of" timestamp on every tile because clinicians do not trust dashboards that hide their staleness.

What went well

We hit the two-week target for v1 and the system was load-bearing inside a month. Daily active users settled north of a thousand across clinicians and ops, and the same pipes ended up powering supply-chain (PPE burn rate), staffing, and eventually a vaccine-rollout tracker — the data lake outlived its original use case, which is the result you want.

The decision I am still proud of is the raw-segment archive. It cost us about 20% more storage and saved us an enormous amount of pain every time the definitions moved. Replay-from-source is cheap; re-pulling from forty hospitals is not.

What I would change

Two things, in retrospect.

First, I underweighted the data-quality surface. We built the dashboards before we built the "is the data lying to you" tooling, and for the first month the on-call rotation was mostly chasing feed gaps from individual hospitals. A simple per-hospital freshness and volume monitor, shipped on day one, would have moved a lot of pain off the dashboard team and onto the right local IT contact.

Second, the ETL was Airflow because Airflow was what the team knew. With another month of runway I would have leaned harder on streaming for the acute-care views. Five-minute latency was fine for ops but tight for the unit-level screens; clinicians sometimes refreshed the page expecting real-time and got something three minutes old. A small Kafka tier in front of the warehouse would have closed that gap.

The takeaway

This was the project that taught me how much of a healthcare data system is governance, not engineering. The pipeline was the easy part. The hard part was getting forty hospitals to agree on what "ICU bed" meant, and then keeping that agreement alive while the definitions changed under us weekly. Versioned mappings and a durable raw archive turned out to be the two architectural decisions that made everything else negotiable.

$← Back to selected work