COVID-19 and the Math of Life Expectancy
By Keith Devlin @profkeithdevlin
The tweet shown here caught my attention a couple of weeks ago.
It wasn’t the fact that the COVID-19 pandemic led to a drop in life expectancy that aroused my curiosity. That is to be expected. It was the magnitude of the drop that caused my “number-sense sensor” to prick up. A fall in life expectancy of a whole year in the first half of 2020 didn’t pass my number-sense sniff test.
There were around 400,000 COVID-19 deaths in all of 2020, out of a total US population of 328 million. That’s about 0.12% of the population. In human terms that is a massive tragedy, the more so given that with an efficient, science-based government response, the figure could have been lower by a factor of 10. But 0.12% of deaths (which were for the entire year, remember, not just the six months of the report), when averaged across the whole population, could bring the total down to at most, perhaps a week, I thought.
Was there an error in that CNN tweet and the article it linked to? I checked The New York Times. It reported the same figure. A look at the generally reliable and more informative STATnews told the same story as CNN and The New York Times. “There is clearly an MAA article here,” I decided at that point.
It wasn’t that I thought the CDC was wrong. They were surely correct that there was a one year drop in something. Rather, that something must not be what I took “life expectancy” to mean.
A quick look at my dictionary app told me that life expectancy is “the average period that a person may expect to live.” Insofar as that told me anything, it accorded with what I had thought. Unfortunately, the vague terms “average” and “may expect” left me unsettled, as they would any MAA member. I checked the CDC website itself for their definition of “life expectancy.” I was none the wiser. What I wanted to know was how would you calculate something you could reasonably call “life expectancy”? In particular, what had the CDC calculated?
I knew that companies, organizations, and government agencies the world over compute annual life expectancy figures, which are used in a variety of ways, including determining premiums for life insurance policies. There are well established procedures for calculating those figures. To go beyond that, I needed to examine the source of those news media stories: the original report from the CDC.
The report’s title alone told a different story than those news headlines: “Provisional Life Expectancy Estimates for January through June, 2020.” What exactly did Provisional Life Expectancy mean? A first read through the CDC report still left me puzzled.
I was clearly missing something. Or, more likely, I was probably reading something into the CDC paper that was not justified — or perhaps was justified but misaligned with what the domain experts understood by that term.
I’d taken issue with something coming out of the CDC once before, namely their endorsement of the diagnostic use of the BMI (Body Mass Index), both in this forum and subsequently on NPR, and in a number of radio interviews that followed. That time, understanding the (simple) math involved and being aware of the history of the BMI, I was confident about my argument, and eventually the CDC toned down their claims. Its website entry now leads with (my highlighting):
A high BMI can be an indicator of high body fatness. BMI can be used to screen for weight categories that may lead to health problems but it is not diagnostic of the body fatness or health of an individual.
[I think they should make a stronger statement, since popular news articles about health continue to refer to the BMI as a reliable diagnostic tool. But it’s not as misleading as it was.]
But to return to the topic at hand, with the life expectancy report I was outside my domain of fingertips expertise. If I were to write my MAA article on the CDC report, I would need to spend some serious time digging into the details.
I was soon led to a fascinating paper in Nature that I’ll pass on for its own value, since it goes into some detail on computing the loss of life due to COVID-19, but I was still not sure where exactly I was misunderstanding the CDC report.
Fortunately, just one week later, STATnews published a follow-up article to their initial one on the topic (cited above) that answered all my questions. First, the author provided his own, layperson’s calculation of the drop in US life expectancy in 2020 due to COVID-19 deaths, arriving at the figure of about five days (in the way I would have done it). That is a bit less than my initial guess of about a week, but these are in any case ballpark figures, so I felt relief that my initial number-sense estimate was about right. More to the point, the author also explained what exactly the CDC calculation was getting at. I’ll leave you to read the article. It is short, well written in simple terms, and explains exactly what is going on. (They are not computing what we layfolk were all thinking of.)
I would quibble with just one claim the author makes; namely, he writes: “The CDC relied on an assumption it had to know was wrong.” I don’t think they are wrong; rather, they are just doing something different. The author’s next sentence provides the clue to what I think is the cause of the confusion: “The CDC’s life expectancy calculations are, in fact, life expectancy projections.” [His emphasis.]
My read of the situation is that the CDC has a method for calculating what they call “life expectancy projections,” that they have found to be reliable and helpful, and when they ran it for the first half of 2020 and compared to 2019, they found a big discrepancy of about a year.
That’s probably fine for the experts on the inside track. They know what they are going and why. But when that report is press-released to the news media, it is going to be interpreted and reported on by non-health-statistics-domain experts, and moreover will likely be mis-interpreted in fairly predictable ways. That, in my view, was the CDC’s error. Not bad math, rather bad math outreach.
I have little doubt that it was of considerable interest to the CDC to know how their regular mathematical model changed due to the pandemic. It is, after all, a model they use regularly to help the nation plan for the future, and that is surely why they developed it.
[Analogously, the BMI is an extremely useful tool for the purpose for which it was first developed: namely planning economic and health care development to account for changes in population health.]
But to the lay population (and here I too am a layperson), the story we read was not about the behavior of a mathematical model, but about our everyday understanding of “life expectancy.”
This confusion is not the first time a body of expert professionals has made results public in a way that inevitably misleads many. I wrote in this forum back in May 2020 about how the widespread publication of epidemic prediction graphs misled many, with the result that a number of influential individuals declared they had lost trust in the (important) predictive results reported by those organizations.
Then, as now, I see two lessons to be learned.
First, professional organizations of experts who produce reports that are likely to be of general interest should put some effort into providing a version that sets out the results and the claims in terms that will readily be understood by everyday folk; in particular, they should avoid using terms that will inevitably be read in a familiar everyday way, even though the experts use those terms in a specialized way in their professional activities.
Second, in today’s world, everyone needs a basic level of number sense and data sense. We all need a “sniff test” to be able to detect a misleading claim. Remember, the CDC is trying to be helpful in informing society. There is no shortage of actors out there with the far more ominous intention of misleading us. And when it comes to matter of health, our very lives may be at stake. Fellow mathematics educators, please take note.