MAA Blog: Helping People Understand Important Statistics

By Keith Devlin @profkeithdevlin

There was a flurry of reports and social media posts recently commenting on an Israeli study showing that nearly 60% of all patients hospitalized for COVID-19 (as of August 15, 2021) were fully vaccinated.

A common take, by both journalists and commentators, was to conclude that the power of the vaccine must have waned dramatically since the early days. (Israel was way ahead of most nations in getting the vaccine out, and almost 80% of all Israelis over 12 years of age are now fully vaccinated. So western nations have been viewing their experience as a good indicator of what’s to come for them.)

In fact, the data does not imply that at all. What is at play is a well-known statistical phenomenon called Simpson’s Paradox, though it would be more accurately to name it “Simpson’s Numerical Illusion.” (There is no paradox.)

Given the (literally) life-and-death stakes involved in fighting the pandemic — particularly here in the US with an insanely high percentage of people suspicious of science and very much against vaccines and masks, the two measures known to be highly effective in protecting against serious illness or death — those of us in the science and mathematics outreach business have a duty to try to counter such erroneous conclusions. But doing so is not at all easy. Simpson’s Paradox is tricky.

A number of math writers have attempted to explain it over the years, including me on this platform (most recently in 2016, with regard to the Brexit vote, and before that in 2004 in a discussion of how to interpret baseball batting averages).

More generally, how can mathematicians explain or communicate any mathematical issue to a general audience for whom that issue is important? In many cases, explaining even the relevancy can be problematical, let alone the math itself. In the case of life-and-death information about a dangerous pandemic, however, the relevance of data about the pandemic’s growth is self-evident, and the challenge is to explain what the issues are and how people can make use of the information the numbers provide.

In last month’s Devlin’s Angle post I took a stab at explaining what led some reporters for highly regarded major newspapers to misunderstand a probability calculation concerning the effectiveness of the COVID-19 vaccines, by falling victim to another commonly made error called the Prosecutor’s Fallacy.

Prof Jeffrey Morris of the University of Pennsylvania

Having published that piece, I was reluctant to follow up immediately afterwards with a second post this month about misreporting on, and misunderstanding, COVID-19 data, this time because of another well-known (to mathematicians) statistical fallacy (Simpson’s Paradox).

Any temptation I might have had had to ignore that reluctance, and write my piece anyway, went away when I saw an excellent coverage of exactly the issue of the Israeli data and Simpson’s Paradox by Jeffrey Morris, a statistical data scientist who is professor and Director of Biostatistics at the Perelman School of Medicine at the University of Pennsylvania.

Prof. Morris runs a blog called COVID-19 Data Science, and in his post on that blog on August 17 he focuses on exactly that issue. I would, I thought, simply add a footnote to my September Devlin’s Angle post pointing readers to Morris’s excellent article.

Prof Jordan Ellenberg of the University of Wisconsin

My decision to not discuss Simpson’s Paradox (again) in my MAA column seemed even more justified when, on August 31, University of Wisconsin mathematician Jordan Ellenberg published a shorter, simpler piece on the same issue in the Washington Post.

In his article, Ellenberg pointed (in addition) to another excellent blogpost on the topic, this one by mathematics writer Dana McKenzie on July 6, 2020.

That’s three excellent sources on the same mathematical topic and its application to COVID-19 data. So why am I writing this article?

Well, one motivation is to make MAA readers aware of all three of those excellent resources. But there is another reason.

Of the three articles I just cited, Ellenberg’s is the most accessible, being written for a general newspaper audience. The other two are more like my Devlin’s Angle posts, being written for readers with some knowledge of, and interest in, mathematics. The average consumer of daily news, in contrast, is unlikely to read through and reflect on the numerical examples presented and discussed in such articles. That’s true even of Ellenberg’s piece — the Washington Post is very much an upper-end newspaper with serious, often in-depth coverage.

But how do we reach people who have interest (in this case, by virtue of an urge for self-preservation) in understanding pandemic data, but who are going to have little appetite for working through a numerical example, even a simple (contrived) one, and in some cases instinctively tune out when they come to a paragraph with math content?

Award-winning mathematics writer Dana Mackenzie

The answer is inescapable: don’t put any calculations in front of them. But what then do you do?

The answer is, you use words to create a simple story they can follow, a scenario they can relate to, or a mental image they can visualize. (You could maybe use an actual image — a photograph or a very simple diagram — but it’s better stick with words if you can, since text allows the reader (or listener) to create their own image(s); indeed, it requires it.

As it happens, when I was deciding on the topic for this month’s post a couple of days ago, I was on a Zoom meeting of the Advisory Board for the Museum of Mathematics in New York City. Because there were some new board members, we began by briefly introducing ourselves, and one of the suggested themes was what got us interested in trying to make mathematics more widely accessible. I recounted my experience early in my career, in the early 1980s when I was a university lecturer back in the UK. At that, time The Guardian newspaper published semi-regular articles on science written by domain-expert academics. There were, for instance, articles on physics by Paul Davis, pieces on genetics by Richard Dawkins, and the like. But no one wrote mathematics articles. One day early in 1983, on a whim I wrote a piece and sent it in as an unsolicited submission.

A day or so later, I got a telephone call from the Science Editor, Anthony Tucker. He said he liked my piece, and asked if I was interested in writing semi-regularly for them. I said I was. “Good,” he replied, “I like your light style and your use of simple, visual, everyday metaphors. So let me tell you why the piece you submitted won’t work in a daily newspaper.” Which he did.

Duly tutored, I agreed that I would keep pitching articles, and he (and subsequently his assistant editor and eventual replacement, Tim Radford) would work with me to guide me on what was required to make them work in a daily national newspaper.

It was not long before one of my submissions was accepted and published. Eventually, I went on to become a regular math columnist, writing a 700-word piece every two weeks. (After I moved to the US in 1987, most of my early articles were published by the MAA as the book All the Math That’s Fit to Print, which to this day remains “in print.”)

In the early days, my two Guardian editors were relentless. I had to learn how to cover a math story in words, by telling stories. No symbols or formulas. It was that painstakingly learned and practiced (under expert guidance!) skill that led to me becoming not just a regular Guardian columnist, but later the NPR Math Guy, after I moved to the US.

So how would I handle the Israeli COVID-19 data story (in print or on the radio)? [I say how would I do it, not how you should, since I developed my own style, guided by Tim Radford’s tuition. Others would perhaps do it differently. I don’t think there is just one, royal way.]

The main data point that comes from the study is that almost 60 percent of people hospitalized with severe covid-19 were fully vaccinated. How do you persuade the average reader that this does not imply that the vaccine is no longer working as well as it did at the start? After all, when the vaccines first came out, they were touted (accurately, based on data) as being 95% effective at protecting against severe illness that requires hospitalization. 60% is a lot less than 95%, right?

ASIDE: Almost certainly, with just two percentages around, the average reader (or listener) will simply compare them directly in the way I just suggested — which makes no mathematical sense. They won’t even get to the point of making the comparison a “better” (but still meaningless) way: 95% effective at preventing hospitalization allows that 5% will require hospitalization, and 60% is more than 5%. But in the story I would write (or tell on NPR, if I were still doing those NPR slots), I would not bring in numbers at all. Doing so already risking losing the main point I want to get across, and likely losing most of my audience as well.

Instead, I would say this. Just imagine for a moment that everyone had been vaccinated. (I would likely add that in the case of Israel, that’s not a huge stretch, since their figure is 80% vaccinated.) That means, in particular, that everyone who has to go into hospital has been vaccinated. So in that case the proportion of vaccinated people in the hospital would be 100%.

The point is, the more people that are vaccinated, the more likely it becomes that vaccinated people end up in hospital. That’s where that 60% comes from. It reflects the fact that a lot of Israelis are vaccinated. (Around 4 out of 5, to be accurate, but this is a story about a concept, not numbers, so I don’t say that.) There are a lot of vaccinated people everywhere, in hospital or out of hospital.

And that’s where I would stop.

I note the Ellenberg makes that same “What if everyone was vaccinated” observation in his article, but he does so as a side remark to set the stage for his subsequent explanation of the pertinent mathematics. Most readers probably fly straight by that one sentence as they look for the meat of his article. (He’s writing for an audience who wants to have the math explained. His goal is to show them how Simpson’s Paradox can mislead. But for my audience, that one, almost throwaway, observation he made is the entire point I want to get across.)

In the early days, my two Guardian editors told me repeatedly that if you keep your reader long enough to get across just one new significant thing, your article has succeeded. Sure, many readers (actually, a minority of readers, but never mind) will complain that you have left out a lot of important information. But as those editors would remind me, a reader who can complain does not need my article.

My goal as a Guardian columnist, and later an NPR commentator, was to reach those for whom the one idea I focused on was a revelation.

Did it make a difference that the person delivering that one idea had a Ph.D. in math and a long list of books and research publications? Yes. Not because I had to draw on that background to write my piece, but to give credibility to what I said.

To be sure, once I felt I had got my one point across, if there were space or time, I would frequently go a bit further (though not too far). In the case of the Israeli study, I would likely go on to mention that the first people to get the vaccine were the elderly and the immuno-compromised, and these would end up being the largest sector of those vaccinated, and because they were far more likely to require hospitalization than younger age bands, that would also push the vaccinated percentage higher, so that 60% also reflects patients’ age and baseline medical situation.

And maybe, just maybe, I’d end by noting — nothing more — that all of this is an illustration of something mathematicians study called Simpson’s Paradox, which looks at ways statistical data can mislead if you don’t handle the figures correctly. Not least, I’d do so because ending that way might pre-empt snide comments from readers who would inevitably begin their tirade, “I’m surprised that someone who calls himself a mathematician does not know about …” “Might pre-empt,” notice. It frequently didn’t. But in such cases, at least I could console myself by knowing I’d made their day by giving them the satisfaction of one-upping and correcting an expert!

How do we help people understand statistics that matter to them?