Making Sense of the COVID-19 Data
By: Keith Devlin @profkeithdevlin
How dangerous is the novel coronavirus, and how does that risk compare with other risks we face in our lives? That’s a question many of us asked as the pandemic started to make its way across the Atlantic. Like a number of other mathematicians, I took data from online sources such as the CDC website and started to play around with the numbers. MAA members would have no trouble doing the same kinds of calculations, and I’m sure many did, and still are. It’s very straightforward. But presenting the results in a way that a layperson can understand is not easy. If you are not careful, the results you get can make the risk of death seem so low the threat is not worth worrying about, or so high we should all just give up hope. Neither is true, but how do you present the numbers in a way that makes that clear? That’s the focus of this month’s essay. It took me a while to come up with what I think does the trick.
I posted my explanation in my personal blog profkeithdevlin.org, and the comments I received from readers (both mathematicians and laypersons) indicated that I’d hit the mark. I present it here purely as an example of mathematical communication, a topic I know interests a great many MAA members.
I was made aware of the need for a good form of presentation of COVID-19 data as a result of contributing to a couple of Twitter threads of mathematicians who were doing the same kind of low-level modeling I was. One post I made, comparing novel-coronavirus probabilities to some associated with more familiar risks, brought some strident attacks from non-mathematicians who accused me of making elementary mathematical errors. “You can’t do that,” they said.
Now, had I been doing what they thought I was, they would have been correct. But I wasn’t, though after a couple of attempts to explain to them what I was doing, and why their critique was misplaced, I gave up. The critics provided a classic illustration of someone viewing mathematics purely as a set of procedural rules to get answers. Moreover, rules that cannot be broken. Using mathematical concepts and methods to explore a problem and gain new insights and understanding was completely alien to them. Yet that is what many professional mathematicians do most of the time.
Still, while I had no problem dropping out of the exchange, it did spur me to think about how I could explain the process I was engaged in, in a way that a mathematical layperson could understand. Not just those Twitter critics — they did after all have mathematical knowledge and I am sure could handle procedural problems (they were not total laypersons) — but the far larger group of people whose knowledge of math is limited to basic arithmetic, but who are very likely eager to know what all the data means.
The point was, I wasn’t “computing probabilities” in the familiar educational context of an undergraduate course in probability theory, and I certainly wasn’t applying any rules from probability calculus. All I was doing was counting collections. Except that I wasn’t even doing that; others had already done the counting part. I was playing around with their numbers.
The specific question I was interested in was: “How much of a risk is COVID-19?” In particular, “How likely is it that I will die from this virus?” (This morbid reflection was occasioned by the news coming out at the time that the deaths in Italy were into the thousands.)
Computing a worst-case probability for that question was not difficult based on the best data available at the time. Here is the calculation I made, updated to reflect a more recent estimate of the mortality rate as 1% of infections, a figure given in the Congressional testimony of Dr. Anthony Fauci, director of the U.S. National Institute of Allergy and Infectious Diseases, on March 3.
Suppose the infection reaches 60% of the population. That is the point where the growth ceases to be exponential (because there are not enough non-immune hosts left to fuel such rapid growth. The epidemiology experts agreed that it is realistic to assume it could get that high; some were suggesting even higher. Though if it did go higher, it would do so at a much slower rate, which means medical services are not overwhelmed and more sick people could be saved).
So, in a population of 327M (the USA), the proportion who will be infected and subsequently die is
327 x 0.6 x 0.01M
which works out to 1.96M.
As a proportion of the entire population, that is 0.006. Six in every thousand, or 0.6%.
Putting it another way, 994 of every 1,000 people alive in the USA today will survive.
To any mathematician, and to anyone else, surely, those look pretty good odds in favor of survival. On the other hand, when you are being bombarded daily with news stories and media images of just how bad the COVID-19 sickness can be, you don’t want that .994 survival probability (which I was about to publish on my personal blog and now for the MAA) to create a sense that compliance with the societal distancing and personal hygiene recommendations that were coming out at the time was not necessary — not least because that has a massive impact at the community level, not just the personal.
One way to counter that would be to observe that, whereas a risk of 0.006 is low, it is a lot higher than the probabilities associated with things we are familiar with, where the risk concerns us, but we do it anyway. For example, the risk associated with taking a flight on a commercial airline. That’s a common baseline comparison that usually gets people’s attention.
The figure most commonly quoted for the risk of dying in an airplane crash is 1 in 5 million, or 0.000 000 2. Doing the math (dividing 0.006 by 0.000 000 2), you find that the probability you will die if you contract coronavirus is 30,000 times that of dying in a plane crash. That was not a figure I wanted to make available to people who might be looking for reassurance! But, in fact, it’s a meaningless number; a classic example of a naive application of the calculus of probabilities.
Stripped of the contexts in which the numbers have meaning, simply quoting probabilities (i.e., abstract rational numbers), let alone dividing one by another, is meaningless. Critics, including a couple on Twitter who thought that’s what I was doing, often describe that process — which is all too common — as “comparing apples with oranges.” “You can’t do that!” they say.
Yet, people compare apples with oranges all the time, including mathematically in terms of weights, volumes, size, curvature, sell-by dates, etc. Do the critics maintain that such comparisons are verboten? Of course not. This is where they and I parted company.
The point is, the comparisons are just fine, all the time you are actually talking about apples and oranges. Where things can go wrong is when you take numbers about apples and numbers about oranges, then perform (legitimate) arithmetic operations on them as pure, abstract numbers, then apply the result to either apples, or to oranges, or to both. (Imagine dividing the weight of an apple by the sell-by date of an orange, and concluding that oranges are a better deal than apples.)
So how could I put the COVID-19 mortality figure of 0.006 into a meaningful context? After some reflection, here is what I came up with. [Much of what follows appeared in the March 17 post on my personal blog profkeithdevlin.org.]
God’s Book
Imagine God has a book with the names of all people living in America today. Against each name, he has entered the cause of their death. (He’s God, so he has made the death entries already.) He won’t let you see your entry, of course. (Mysterious ways.) You do, however, have a chance to look at the digital version. Not with enough resolution to read the individual entries, but enough to discern the separate entries as such. You ask him to re-order the list grouped by the listed causes of death.
The biggest group is labeled “heart disease”. Roughly 1 in 6 of the names on the list are in that group; as a decimal, approximately 0.167. That’s a figure quoted regularly in the medical profession. For example, this article, which lists the ten most common causes, responsible for 74% of all deaths. But the exact value is not important to get an overall picture of the risks I am talking about.] Next comes “Cancer”, just slightly smaller. Then accidents. And so on.
This year, God had to add a new category: COVID-19. Where does it come on God’s List? The answer is, pretty low down the (reordered) list. In fact, the group of names that have that label in the “cause of death” column is (roughly) just 1/28th the size of the heart disease group. [.006/.167 = 1/28 approx.]
But be cautious. COVID-19 presents a different kind of risk than heart disease. The latter is something that creeps up over your lifetime, and, if you get medically examined regularly and you are found to have it, you usually have plenty of time to modify your activities and habits to slow it down or even stop it developing further. These are most definitely apples and oranges!
Death from COVID-19 is a risk that (we all hope) you will only face in 2020 (because by 2021 we hope to have an effective vaccine), and the length of time you have to change your behavior to save yourself is measured in a few weeks (if you are following the news from a reliable source), or maybe just days. On the other hand, the behavioral changes required are very simple, albeit involving a major disruption of your daily life. Thoroughly wash your hands regularly, apply hand-sanitizer after coming into contact with any person, pet, or object, avoid touching your face, and keep at least six feet from any other person.
Nevertheless, in terms of what God has entered against your name as the cause of your death, it’s highly unlikely to be COVID-19. It’s 28 times more likely to be “Heart failure.”
In other words, what the numbers tell us is that you are extremely unlikely to die from COVID-19, whatever you do. But, assuming you’d rather live than die (and would rather avoid what can be a truly terrible illness even if you survive it), then if there were actions you could take to make your odds for not dying from this virus, you should take them.
As we just saw, the actions required are a nuisance, but you only have to do them until the virus simply goes away (not likely) or we find an effective vaccine (highly likely within a year, maybe a bit more). So COVID-19 presents a numerically small danger made acute by being a risk entirely within a single year or a year-and-a-half. Avoidance is simple and is required for a relatively short time, but it’s a nuisance. But, given how heavily your overall survival odds are stacked in your favor, the incentive to take COVID-19 avoidance measures is huge, since they are almost certain to be effective. Besides, with a virus epidemic, avoidance measures any one of us takes benefit all of society, including our families, friends, and loved ones.
In the case of heart disease, similar conclusions follow, but the time-scale is very different. In this case, it’s one of a small collection of illnesses (those ten I referred to) that you are highly likely to die from, but the threat lasts your entire life. Avoidance measures are also simple (principally diet, exercise, and not smoking), but they do need to be sustained.
The take-home message, then, is that during this year (and maybe into next), COVID-19 is a threat you should not treat lightly. But in terms of God’s List, this is most unlikely to be what he has entered against your name. (“Heart disease” is far more likely.)
In conclusion, if you do come through the COVID-19 epidemic, then, on January 1, 2021, with the novel coronavirus threat behind you (or at least soon to be so), you will be in a position to make a New Year Resolution to make sure your diet and exercise habits can fend off heart disease. For that’s a deadly threat that continues throughout your life. And the odds that this killer is the one that gets you are a lot higher than for that virus you just dealt with. Why get through one threat and then squander your chances to avoid a much greater one?
Read archived posts.