Comparing Apples with Pangolins: The Case of the Missing Deaths and the Perils of Big Data

Nigel Adams
May 10, 2020
2 min read

As the COVID-19 story continues, we have been subjected to a crash course in data literacy, bombarded with facts and predictions, introduced to a wide range of new data visualisation graphics, and are learning to love exponential distributions on a semi-log scale. Publishers try to present data in as simple and compelling way as possible.

Unfortunately, COVID-19 is not simple. “Which country has suffered the most deaths from COVID-19?” Surely a simple question to answer, with many readily available data sources, from government agencies to research and media organisations. However, it is far from straightforward as some recorded deaths are only those where the individual died in hospital after testing positive for COVID-19 and may not include those who died in nursing homes and were never tested. The dates used may be inconsistent because some countries record them as at the date of death and others as at the date the death was registered.

In an age of big data, sourcing the data and creating the chart is no longer an issue, but interpreting it correctly is fraught and has led to increased attention on the unexplained death data. Assuming no other extenuating circumstances, such as war or famine, a country’s mortality rate is relatively predictable and leaves little room for misinterpretation. But even this is not necessarily accurate: as an example of this, it is not unreasonable to expect transport-related deaths to have reduced substantially during this period as countries have restricted all but essential travel.

It is hard to tell a complex story in one or two charts within the constraints of shortened attention spans and there are so many variables: what is the source of the data and how is it calculated, why has the author chosen a particular technique to visualise the data, what do they want you to see and, perhaps more importantly, what don’t they want you to see?

Given how time-poor we are, digging through mountains of data to answer these questions is not always practical. In some ways the simplest answer is to find a source you trust, adjust for any biases e.g. political pre-disposition, and use this as a base. You still need to ask the questions:

What is the scale? Is it continuous or broken? Does it start at 0? If not, why not? Are right-hand and left-hand scales clearly marked?
What is missing? Is there a data element that should be there that is not e.g. a regional map showing North, South, and East but no West?
Is data obscured? Is there a piece of commentary or some other graphic hiding part of the data?
Is it clear? Is the data clearly labelled? Is it uncluttered?
Is it current? What period does it represent? Are the time intervals appropriate e.g. showing annual data for a trend that is moving daily will not help?
Is it relevant? Does the graphic reflect the narrative?

You may not find the answer you are looking for, but you are less likely to be hoodwinked by eye-catching graphics with spurious data sources.

If you want to see some excellent examples of making big problems accessible and understandable through data, I’d suggest ourworldindata.org. If you want to improve your data visualisation design skills I’d suggest Cole Nussbaumer Knaflic’s excellent book Storytelling with data.

Comparing Apples with Pangolins: The Case of the Missing Deaths and the Perils of Big Data

Recent Posts

Comments