Bayesian MCMC Site-Level Patient Enrollment Forecasting - Clinical Trial Supply Chain

PyMC3 · Gamma-Poisson Bayesian Inference · AWS Glue · S3 · MLflow · SAP IBP · Veeva Vault

Client: Global Pharmaceutical Client (Top-10 Oncology Biopharma)
Role: ML Engineer & Data Engineering Lead · MLOps Architect
Programme Scale: 8 Phase 2/3 Oncology Trials · 40 Countries · 80+ Trial Sites · 2-Year Delivery

This case study demonstrates core ML engineering principles applicable across any domain requiring probabilistic demand forecasting under extreme data sparsity: Bayesian hierarchical modelling, uncertainty-aware supply planning, and production MLOps for regulated enterprise environments. The model output was adopted at VP level - replacing a methodology that had been in place for over a decade - validating both the technical rigour and the stakeholder engagement approach required to drive change at this scale.


Executive Summary

Metric Value
Estimated supply waste avoidable over 10 years (model vs 2× heuristic) ~$2B USD
Safety stock heuristic replaced 2× enrolled patients (over-ordering baseline)
Supply planning horizon extended 3 years (site-level, probabilistic)
Forecast accuracy (MAE, monthly) 63% - first quantitative forecast in client’s oncology trial history
Trials monitored 8 Phase 2/3 oncology trials
Geographic coverage 40 countries
Prior method None - manual heuristic with no ML or statistical modelling
Enterprise integration SAP IBP + Veeva Vault via flat-file pipeline

This system replaced the client’s 2× enrolled-patient safety stock heuristic - a blunt over-ordering rule that generated $2B of estimated drug and placebo waste over a 10-year horizon - with a probabilistic, site-level, 3-year enrollment forecast grounded in Bayesian inference. For large-molecule oncology drugs in Phase 2/3 blinded trials, where per-patient supply costs are substantial and blinded trial supply requires matched drug/placebo allocation, the reduction in systematic over-ordering had material commercial impact.

The system produced the first statistically grounded supply forecast in the client’s oncology trial history.


1. Business Problem

The Clinical Trial Supply Chain Problem

Pharmaceutical supply chain planning for clinical trials is fundamentally a probabilistic enrollment forecasting problem made structurally harder by:

The Heuristic That Was Replaced

The client’s pre-existing approach to trial drug supply was a 2× multiplier on expected enrolled patients: order twice the drug quantity implied by the planned enrollment. This heuristic:

The Objective

Build a site-level probabilistic enrollment forecasting system that:

  1. Produces monthly enrollment predictions per site, per country, per trial - with 80% confidence bands
  2. Accounts for patient attrition and site-level dropout rates
  3. Handles patient transfers between sites (tracked via IRT reference IDs)
  4. Updates in real-time as actuals from IRT are observed (Bayesian updating)
  5. Feeds into SAP IBP for 3-year supply planning, replacing the 2× heuristic

2. Why Bayesian MCMC - Not Classical Forecasting

The Core Challenge: Extreme Data Sparsity at the Right Level of Granularity

The forecasting problem requires site-level, monthly predictions. In practice:

Classical time-series approaches (ARIMA, Prophet) require sufficient within-series observations to identify patterns. Site-level enrollment series - with monthly counts of 1–5 patients - violate this requirement entirely.

The Bayesian Advantage

Requirement Classical ML Bayesian MCMC
Site-level monthly counts of 1–5 Insufficient signal ✅ Informative priors encode historical rates
Data sparsity at site level Model collapses ✅ Prior from country-level observations
Uncertainty quantification Ad-hoc post-hoc CIs ✅ Native posterior distributions
Hierarchical structure (indication → TA → country) Manual stratification ✅ Hierarchical prior fallback
Bayesian updating with actuals Requires retraining ✅ Conjugate posterior update (no retraining)
Interpretability for supply chain planners Black box ✅ Explicit probabilistic statement

Why Gamma-Poisson

Patient enrollment at a site is naturally modelled as a Poisson process: patients arrive independently at a rate λ (patients per month). The Poisson parameter λ itself is site-specific and uncertain - modelling λ as a Gamma-distributed random variable gives the Gamma-Poisson (Negative Binomial) compound distribution, which:


3. Data Architecture & Engineering

Data Sources

Source System Contents
Veeva Vault Clinical Operations Site activation dates, site status, historical enrollment rates (MICE-imputed), trial metadata, TA/Phase/Indication classification
IRT System Interactive Response Technology Patient randomisation records, site-level monthly actuals, patient tracking IDs, dropout reason codes
CTA Forecast Clinical Trial Agreement Country-level planned enrollment curve (Cohort-0 baseline), monthly expected totals

AWS Data Platform

The data engineering architecture was designed and operated on AWS:

IRT-Based Patient Transfer Tracking

A non-trivial data engineering challenge: patients occasionally transfer between sites. Without correction, a transferred patient appears as a dropout at the source site and a new enrollment at the destination - inflating attrition and enrollment rates simultaneously.

Resolution: IRT stores a patient tracking ID and a reason-for-dropout field. By joining on tracking IDs across site records within a protocol, transfers were identified and excluded from attrition counts, with enrollment credited to the correct receiving site. This required building a patient-level reconciliation layer in Glue above the site-level aggregation.


4. Modeling Architecture

Overall Design: Two Operating Modes

┌────────────────────────────────────────────────────────────┐
│  MODE 1: BASELINE FORECAST (Trial in Planning Stage)       │
│  Inputs: Veeva (historical) + CTA plan                     │
│  Output: Site-level enrollment projections from t=0        │
│  → Country_Forecast_Adaption.py                            │
├────────────────────────────────────────────────────────────┤
│  MODE 2: REFORECAST (Trial Active, IRT Actuals Available)  │
│  Inputs: Veeva + CTA plan + IRT actuals                    │
│  Output: Updated projections incorporating observed data   │
│  → Reforecast.py (Bayesian conjugate update)               │
└────────────────────────────────────────────────────────────┘

Step 1 - Prior Elicitation (Country-Level Distribution Fitting)

For each Country × TA × Phase combination, historical enrollment rates from completed or closing trials (excluding the current protocol) were extracted. A Gamma distribution was fit to these rates using maximum likelihood estimation (SciPy):

# Fit Gamma distribution to country-level historical enrollment rates
a, loc, scale = gamma.fit(data=metric_observed_values, floc=-1e-10)
param_dict = {'alpha': a, 'beta': 1/scale, 'loc': loc}

Two prior variants were generated per site:

Step 2 - MCMC Posterior Sampling (Site-Level, PyMC3)

For each site with historical observations, the Gamma prior was updated against site-level observed enrollment rates using MCMC posterior sampling via PyMC3 with the NUTS (No-U-Turn Sampler):

with pm.Model() as mcmc_model:
    # Informative Gamma prior from country-level distribution
    alpha = BoundedNormal('alpha', mu=alpha_mean, sigma=1, testval=10)
    beta  = BoundedNormal('beta',  mu=beta_mean,  sigma=1, testval=10)
    
    # Likelihood: enrollment rate is Gamma-distributed
    observed = pm.Gamma('obs', alpha=alpha, beta=beta,
                        observed=site_enrollment_rates)
    
    trace = pm.sample(2000, step=pm.NUTS(), chains=2, tune=200)

# Use final 10% of chain (post burn-in) for posterior estimates
alpha_posterior = trace['alpha'][int(0.9 * n_samples):]
beta_posterior  = trace['beta'][int(0.9 * n_samples):]

Hierarchical Fallback Logic (handles cold-start sites):

Site has indication-level data?
  └─ YES → MCMC update with indication-level prior
  └─ NO  → Site has TA-level data?
              └─ YES → MCMC update with TA-level prior
              └─ NO  → Sample directly from country-level Gamma prior

Step 3 - Poisson Process Simulation

With each site’s posterior Gamma parameters, enrollment trajectories were simulated using an inverse-method Poisson process:

# Simulate patient inter-arrival times (inverse Poisson)
inter_event_time = -log(1 - uniform_random) / lambda

# Run 440 independent sample paths per site
# Generates monthly enrollment counts over the forecast horizon

This simulation naturally captures the discrete, stochastic nature of patient arrivals - months with zero patients, burst months, and long-term variance are all represented in the sample paths.

Step 4 - CTA Plan Alignment

Raw MCMC-derived site rates are anchored to the country-level CTA forecast. The adjustment:

  1. Compute each site’s proportional share of the country-level MCMC enrollment rate
  2. Compute the residual between the CTA plan and the sum of all site MCMC rates
  3. Distribute the residual to each site proportionally - preserving site rankings while ensuring country totals match the agreed CTA plan
psm_ratio = mean(site_samples) / country_level_mean
site_adjusted = site_samples + (psm_ratio * cta_residual)

Step 5 - Reforecast: Bayesian Conjugate Update

Once trials are active, IRT actuals enable a Bayesian posterior update without re-running MCMC:

Given:

Conjugate posterior: Gamma(α + k_i, β + v_i)

This is exact Bayesian updating for a Gamma-Poisson model. The posterior becomes the new sampling distribution for Poisson process simulation over the remaining forecast horizon.

Confidence Interval Construction

80% confidence intervals were computed via t-distribution over the 440 sample paths per site:

t_val = t.ppf((1 + 0.80) / 2, df=10000)  # ~1.282
lower = max(0, mean_enrollments - t_val * std_enrollments)
upper = mean_enrollments + t_val * std_enrollments

5. MLOps & Model Governance

Model Tracking

Model runs were tracked in a proprietary MLOps platform with MLflow-based model tracking:

Inference Pipeline (Batch)

The system operated as a monthly batch pipeline:

Monthly Trigger
    → AWS Glue: Extract IRT actuals, refresh S3 Curated zone
    → MCMC: Re-sample posteriors for active sites (actuals updated)
    → Reforecast: Run Poisson process simulations (440 samples × N sites)
    → Output: Flat-file per trial (site × month × mean/low/high)
    → Delivery: SMTP notification + flat file to SAP IBP staging

The flat-file integration to SAP IBP was a deliberate design choice: IBP has a defined intake schema for supply plan inputs, and a file-based interface decoupled the ML system from IBP’s internal release cycles while maintaining auditability of every number that entered the supply plan.

Monitoring & Alerting


6. Cross-Functional Delivery

This programme required orchestration across 15 people spanning 4 organisations:

Workstream Organisation Responsibility
ML Modelling Client / Vendor Bayesian model design, MCMC implementation
Data Engineering & MLOps Client AWS Glue pipelines, S3 data lake, MLflow governance
Clinical Operations Client Veeva data ownership, IRT configuration, trial metadata
Supply Chain Planning Client SAP IBP integration, safety stock methodology, demand signals

My direct team (4 engineers + self) owned: AWS data pipelines, S3 lake architecture, Glue ETL, MLflow model governance, flat-file IBP integration, CloudWatch monitoring, and end-to-end delivery coordination.

Programme delivery: 2 years from requirements to production - requirements definition, data architecture design, Glue pipeline build, model integration, SAP IBP integration, and trial operations team training.

Stakeholder engagement: Programme success metrics were reviewed with Supply Chain VP and Clinical Operations leadership on a quarterly basis. The decision to integrate model outputs into SAP IBP as the system of record for safety stock planning was made at VP level - replacing a methodology that had been in place for over a decade.


7. Domain Complexity: Blinded Trial Supply

For blinded, controlled oncology trials (e.g., active drug vs. placebo), supply planning has an additional constraint: the drug-to-placebo ratio must be maintained at the site level throughout the trial to preserve blinding. This means:

The system produced separate enrollment projections for each trial arm via the site-level outputs - allowing supply planners to calculate arm-specific safety stock quantities in SAP IBP rather than applying a blanket 2× multiplier to total trial enrollment.


8. Results & Business Impact

Quantitative Outcomes

Outcome Value
Forecast accuracy (first-ever quantitative forecast) 63% MAE - monthly site-level
Estimated supply waste eliminated $2B over 10-year planning horizon
Safety stock heuristic Replaced: 2× enrolled-patient blanket rule → probabilistic 80% CI
Planning horizon Extended from reactive to 3-year forward-looking
Trials operationalised 8 Phase 2/3 oncology trials
Countries covered 40 countries
SAP IBP integration Monthly batch · flat-file · automated

Why 63% Is a Strong Result

In clinical trial enrollment forecasting, 63% MAE accuracy at the monthly site level is a materially strong result given:

The relevant comparison is not 63% vs. 90% - it is 63% vs. 0% (the prior state: a fixed 2× heuristic with no forecast capability at all).

The $2B Impact Logic

The 2× safety stock heuristic was shown to systematically over-order relative to actual enrollment. For oncology biologics:


9. Technical Alternatives Evaluated and Rejected

Alternative Reason Rejected
Prophet / ARIMA Monthly site-level series of 1–5 events: insufficient for time-series pattern identification
XGBoost regression No native uncertainty quantification; requires large training dataset per site
Simple Gamma MLE (no MCMC) No posterior uncertainty; point estimate doesn’t propagate to supply confidence intervals
Neural Bayesian (e.g., Pyro) Overkill for data volume; NUTS convergence preferable at this problem scale
Single country-level forecast disaggregated proportionally Cannot capture site-level activation timing or site-specific attrition patterns

10. Lessons Learned


System Architecture

System Architecture Diagram


Technology Stack

Category Technology
Bayesian Inference PyMC3 · NUTS Sampler · Metropolis-Hastings
Statistical Modelling SciPy (Gamma MLE) · NumPy · Pandas
Data Engineering AWS Glue · Amazon S3 (3-zone lake)
Imputation MICE (Multiple Imputation by Chained Equations)
ML Governance MLflow (experiment tracking · model registry · artefacts)
Orchestration AWS Glue Workflow · CloudWatch Events
Monitoring Amazon CloudWatch · SMTP alerts
Enterprise Integration SAP IBP (flat-file) · Veeva Vault
Clinical Data Systems Veeva Vault · IRT (Interactive Response Technology)