Risk Intelligence

Data, numbers, charts and graphs – and what they won’t tell

Numbers don’t lie. We can trust the stats. Can we? Ever since covid-19 has been dominating the headlines around the globe, statistical literacy has come out of its geek territory and is on everyone’s mind. Or, is it?

Who questions what data feeds drip into the aggregates we rely on? Those tables, charts and numbers that we’ve been waking up to every day for more than a couple of months that are communicated by major news outlets such as the New York Times, the BBC, Die Welt or their many counterparts elsewhere, have been reaching out into every single household, it seems. They hold authority, they can cause anxiety, inaction, they move or immobilise – as we have seen. Data is power. Using data in misleading ways certainly is an abuse of power. But data being misleading per se? Not sure, most people say.

Johns Hopkins University data and charts appear authoritative. Even more so as the references, limitations, dates of data submission and availability etc are usually buried in tiny font, somewhere, but nowhere obvious. The practice resembles the footnotes in Financial Statements and Annual Reports, where we expect to find the vital fine print indeed in those footnotes – but most of us do not expect this in health-related reports and data representations vital to decision-making and protective actions during a pandemic.

Those less academically inclined, or more sceptical, depending on their self-perception, frequently refer to Wikipedia (I do have an account too but critical media literacy implies that I know who edits those entries, and question the limitations – and so should you). While often a useful starting point, it tends to be another source of misleading data (meaning it is static data when it should be dynamic, it may not indicate what was left out and it may just provide another loop-sided picture of a pretty complex matter) that tells at best half a story when the other half of it is down to cultural and national knowledge. Let me explain.

Most, for instance in Germany, know little about the practices that underpin the issuing of a death certificate. Even fewer know anything about this outside Germany. An even smaller number of people would know how these German practices compare to say the British, or French or Finnish ones. Or in which nations a post-mortem is mandatory (in Britain yes, but not in Germany) and what tests are part of them. Which cause (of death) may appear and is documented, on the certificate in the case of multiple conditions, overlapping ones, or a cause that is attached to a social taboo and one that could cause a criminal investigation, to name only the most prevalent.

The list goes on. Many are, by now, alerted by the reporting during the pandemic, vaguely aware that deaths occuring in care homes in various countries were not counted or reported with significant delay. In nations with lagging digitsation (such as Germany) and federal borders within national borders (such as the USA or Germany) there are clearly reasons for such delays but they are not accounted for in the statistical data we rely on. In fact they did not even enter the mainstream media and social media debate.

In short, many citizens – not just those trained in data science and statistical analysis – but all or most of us, have become a lot more aware of the numbers and the underlying lack of certainty, the lack of reliability which is due to differing definitions, practices, politics and reputational issues, but also down to financial resources, blame and accountability and of course, a whole range of other issues. By extension, this awareness should trigger a dicussion and related awareness of inbuilt bias, – especially in the area of machine learning and artificial intelligence, or any area that makes use of algorithms. Will it?