Random Walk: Models, Models Everywhere

Another post today that deals to a degree in arcanum. The latest iteration of the now famous models of the outbreak of SARS-CoV-2 and its travelling companion, COVID-19. The Institute for Health Metrics and Evaluation (IHME) have been producing models for some weeks now; models which are updated about every three days or so.

As noted, the model uses empirical data sourced from a number of locations to fit (by force) a sigmoidal function called the "Gaussian Error Function" (a sort of fancy way of looking at likelihoods that a variable drawn from a bell curve will fall within a given range).

Though most of the popular press focus on just one result - the projected total number of deaths - in reality the model offers mortality as well as resource utilisation. The latter focuses on items critical to health care - the number of people who will land in hospital, the number who will land in the ICU, and how many will require mechanical ventilation.

The results of the prior iteration for the US were that, by August, just over 65,000 Americans would lose their lives, with the peak on the 15th of April. That represented a reduction from just over 68,000 last week. The latest projection is now up, slightly, to 65.976.

The model must be re-fitted with new data as they become available.

People are to a degree, missing that models of this sort are not true epidemiological tools, as they have more often than not, been deployed to project resource needs. And in this case, what (and when) will the greatest demand be for the critical medical resources be.

Of course, this approach (forced curve-fitting) is only one approach, and the IHME model has come under some criticism (I honestly suspect, some is motivated by politics, as this is the tool that Dr Deborah Birx and the US Coronavirus Task Force are using).

There are broadly three types of models that are commonly used in this field. These are the parametric "SEIR" models (making estimates across populations based upon the number of people who are susceptible - not yet infected but who could be, exposed - those who are exposed, but not yet positive, infected, and removed - those no longer at risk or ill because they either have recovered, or are dead), agent-based models (in a nut-shell, simulations not unlike the old game Sim City, where a population is created with an initial number of infected people, where the transmission can occur if an infected and uninfected person encounter one another, and then the populations are followed over time until resolution), and curve-fitting models (of the sort the IHME is).

In my career working as an epidemiologist and statistician, I have worked with all three types. (To be more precise, SEIR actually has as a subgroup, SIR, where "exposed" people are collapsed and distributed into the population). Each has its strengths and weaknesses. All require assumptions to be made.

Most recently, I worked on a team modelling the impact of the introduction of pre-exposure prophylaxis in HIV. We used an agent-based model largely because we simply lacked the right amount of information about movement from compartments in an SEIR model, and thought that the agent-based approach allowed for better hedging against some of the assumptions, as well as being easier to test the assumptions through what in mathematics is called probabilistic sensitivity analysis, or PSA - re-running thousands of mini simulations where the assumptions, rather than being held fixed, themselves are allowed to be drawn from a random distribution.

While I agree with, e.g., Dr Marc Lipsitch (who is one of the world's leading epidemiologists at Yale) that curve-fitting models of this type are unorthodox in epidemiology, I think it's a useful tool.

A lot of hype surrounded the initial projections of the Imperial College in London (a group I, personally, have worked with in the past) that somewhere north of 2 million would die without mitigation. That projection was derived from an SEIR model. Of course, the US has deployed a series of increasingly strict shelter in place initiatives, and that projection almost surely is going to be "wrong" by an order of magnitude.

Statistician George Box once observed, decades ago, that all models are wrong, but some models are useful.

All of the competing models are going to be "wrong" in the end. But what use do they provide now?

First, they give us some parameters on where the pandemic is going. It's not certainty - pace Box - but the projections give us some space to work with. And most provide, in addition to the projections, confidence intervals (or credible intervals) that give a range of what is likely to happen. The less certainty, the wider the bands.

When reading the projections, you ignore these bands at your peril.

I don't want to weigh in on which is "right," because frankly, they are all going to be wrong. Paraphrasing Tolstoy, while projections that are right are right in the same way, wrong models are wrong in their own way.

Here is a sampling of a few competing approaches.

Columbia University in New York have built an SEIR model to estimate mortality and ICU usage under differing scenarios of mitigation. Estimates ranged from 6800 (with extreme social distancing) up to over 400,000 under strong mitigation.

Northeastern University have produced an agent-based model here that offers a few scenarios for 'stay at home' mitigation. Its most recent projections are that, by mid-May, approximately 70,000 people will lose their lives. The uncertainty range is 42,000 to as many as 127,000.

Unfortunately, they do not project beyond then, but given the rate of decline of the curves, barring a second flare up, the final mortality is going to be around that number.

A late entry is from a team at MIT, who have an unorthodox approach similar to an SEIR model, but that does not presume priors for initial transmission risk (the famous Ro), but "learns" from the data. That, and other key parameters.

This model is something of an outlier, in two ways. First, while other approaches cease projecting past the "first wave," which is estimated to be more or less over between late June and early July (even for the MIT model), the data scientists at MIT incorporate a second wave - one that will begin to grow in about middle July.

This echo will add about 25 to 30,000 deaths. It could result in as many as 280,000 when the dust settles.

The first wave is projected, as of now, to kill 104,000 or so by the end of June. The range here is anywhere from 70,000 to 170,000. This is a large outlier from the others.

What then, are the take-aways?

My own preference is agent-based, for no particular reason other than familiarity. The lone agent-based simulation pegs mortality at around 70,000 in the first wave. The team did not project a second, which isn't to say that there won't be one.

The IHME model is now projecting about 66,000 (range or 45-125,000), which is, if not the same as the Northeastern model, in the same neighbourhood.

One piece of good news is that the models, with the exception of MITs AI model, are converging around similar stories. The fact that the models, using different presumptions and different approaches, net out to a similar result at this stage (we are still three months from July), is a sort of empirical validation.

And it looks like, as of now, mortality is more than likely going to land somewhere between 60,000 and maybe 100,000 at the top end.

It could be worse, of course. It could be much worse.

The second thing to glean is that social distancing and mitigation are working. All models are converging down, not up, with additional data. The sacrifices we are making are actually, as of now, writing a very different - and brighter - narrative than the story as it was unfolding a few weeks ago.

Keep staying home. Keep distancing yourself. Keep good hygiene.

It's working.

Finally, as the MIT model indicates, we need to be especially vigilant in the later summer. If there is a second wave - and there is every reason to believe that one could come, it is going to be absolutely essential that our public health professionals keep a damned close eye.

And it means our political leaders need to be ready to sound the alarm if there is even a whiff of an outbreak.

And it means we as citizens need to listen - and we need to obey - when we are asked to observe stay at home orders, and not go to the hair salon. It's that simple.

Treatments are coming. Vaccines are coming. It will not be tomorrow.

Again, it's worth saying, by our actions, we are choosing our own future.

Random Walk

Thursday, 23 April 2020

Models, Models Everywhere

No comments:

Walk on over to these sites...

Previous Walks

About Me

Followers

Search This Blog