Luc Brunet – 27 July 2021 (updated August 2021 now including Russian data for 2020)
This article is dedicated to the statistical aspects of COVID, trying to analyze what are the numbers we should trust and the ones we should not, and unfortunately, the second group is by far bigger than the first one!
A citation attributed to W Churchill shall be a good introduction to the article:
“I only believe in statistics that I doctored myself”
Coming back to COVID, TV and media regularly use terms to show how fast is the epidemics developing, or more rarely, how fast it is retreating. Let’s review one by one what are those numbers or ratios, and how much trust we can put in them.
The number of cases
The number of cases is very often used in the medias. It is often impressing by the high values shown on graphics, and how fast it can grow or decrease. This number is however the less reliable one, and can be manipulated in many ways.
COVID is not the plague and most people infected are either asymptomatic or have light forms of symptoms. The numbers we get in medias are mostly coming from the results of tests (general PCR) that are done among the population. The number of positive cases detected during a given period largely depends on a few parameters:
– the number of tests done over the period. While tests were very rarely performed in spring 2020, even on sick individuals, their number increased considerably in winter 2020-2021, with massive testing campaigns. Holiday periods when certificates on negative PCR test are required to travel also created peaks in the number of tests performed. More test meaning more asymptomatic or light symptomatic cases being uncovered, the number of cases increases during such periods
– the result of PCR tests is itself depending on the sensitivity level used by the testing authorities, a subject that was discussed quite often in medias. I shall not even mention cases of false positive cases, but changing the sensibility level of the tests can bring the result you want, either sudden increase or sudden decrease in the number of cases. Ironically, as as there is no internationally respected standard of sensibility, there have been cases (I know personally two) of people who flew from one country with a negative PCR test, and were tested positive at arrival one day later.
The large uncertainties about the number of cases should not be a surprise anyway, and reliable numbers could only be achieved by testing 100% of the population in one day, of course impossible. For the usual seasonal flu for example, no tests were used, and the number of cases per year (used to calculate other parameters like fatality, as discussed later) is indeed calculated. In the US for example, the CDC registers the number of people declared and treated for the flu, then relies on a sample of population, asked if they had flu symptoms, and if they have been declared, given then the % of declared versus all cases, and use that % to estimate the total number of flu cases in the country. We are far from a fully reliable number.
The number of dead
In the case of COVID, the confusion seems to have been huge in many countries. During spring 2020 In France, many dead in retirement houses have been assigned as COVID, even if no test was done – COVID was the reason by default for almost all deaths. In most countries, especially in the West, autopsies are rare, generally reserved for suspicious deaths, and the attending doctor puts on the record what he believes is the cause of death. In the emergency and panic that almost overwhelmed many hospitals, and with no involvement of relatives who were lock-downed at home, everything was possible.
To make the link with the case of the seasonal flu as discussed above, the CDC in the US declare that flu casualties are registered for people who really died from the flu infection, and not from co-morbidity like cancer, diabetes or others. This point is important and it is clear that comparing the mortality of the flu and the one of COVID is tricky. For flu, only flu related deaths were counted, for COVID not only.
The number of hospital patients and the number of ICU patients
Although not all countries publish such data, it is available for many, especially in the West. Those numbers provide the total number of patients, independently of the name of the illness. They however provide a very interesting information, especially when rapid changes occur. As such, those numbers should be quite reliable to measure the load on the medical system, although the reason for being hospitalized can be manipulated as well, just like for the number of dead.
Fatality rate
This ratio is the % of dead versus the total number of cases. It is clear that this ratio is highly unreliable, knowing that the total number of cases and the number of actually COVID related dead are both unreliable numbers.
From the beginning of the epidemics, we have seen extreme situations when countries did not test anybody, even in hospitals, so had a very low number of cases per million, but registered all doubtful death as COVID related. The result was fatality ratios at the level of 20 or 30% (on 10 people infected, 2 or 3 shall die), obviously very far from reality. But some medias used such data in 2020 and scared the population big time!
Relation with country population
When comparing countries, journalists should always have used data related to the total population of each country. They did not always respected that obvious rule. The recent crisis is India is a good example, when we have seen reports for example on French TV, mentioning a terrible number of for than 4,000 dead per day in India, surely impressing for French viewers remembering the numbers of 800 dead per day in France at the top of the pandemics in April 2020. The usual “experts” explained that India was going to an apocalyptic disaster. The problem is that the same numbers related to the total population look very different. At the top of its crisis this year, India reached 3 dead/million per day, while France reached 14 (almost 5 times more!) in 2020. Which country dropped the ball?
Is there any reliable data?
The most reliable way is to look at the overall mortality rate in a given country, compared to the average mortality rate over the previous few years (I generally use 10 years). Databases exist on such data, and international data can be accessed on HMD (Human Mortality Database) at https://mpidr.shinyapps.io/stmortality/
Data about mortality is provided by many countries, but not all, and sometimes with long delays, but data on 2020 and 2021 can be accessed for major countries. It is clear that accuracy cannot be 100%, but we can consider such data as much more reliable than COVID mortality data. It is relatively straight forward to assign a death to a false origin, while it is much more difficult to hide a death or create it from scratch!
Here is what we can see from the overall mortality data, limiting ourselves to a few countries. If you are interested in other countries, I remind you the site address https://mpidr.shinyapps.io/stmortality/
Overall mortality in 2020
The mortality rate in the database can also be analyzed by age range, starting with 0-14 years, then 15-64, 65-74, 75-84 and more than 85. Based on the results from the database, countries can be classified in a few categories.
Type 1.1 France, Belgium, Finland, Netherlands, Sweden
Such countries show an overall excess in mortality over 2020, but no excess in the 15-64 category. It means that COVID hit older citizens only, while death of younger people is not visible at macro level, although a small number of young people actually lost their life, but statistically speaking, COVID would have been unnoticed outside of the elderly category.
Type 1.2 Austria, Germany, Czech Republic, Denmark, Israel, Slovenia, Switzerland
Same as above, but most casualties in such countries happened in winter 20-21, with very limited impact in spring 2020 by the original Wuhan variant
Type 2.1 Bulgaria, UK, Hungary, Italy, Spain, Portugal
In those countries, the impact on the 15-64 age range is almost as strong as on elderly people
Type 2.2 Croatia, Greece, Poland, Slovakia, Russia
Same as above, but most casualties in such countries happened in winter 20-21, with very limited impact in spring 2020 by the original Wuhan variant
Type 3 Canada, Chile, USA
In those countries, the excess mortality is spread all over 2020 and is high in the 15-64 range, and sometimes higher than for elderly categories
Type 4 Norway, Australia, Taiwan, South Korea, Japan
No real impact on mortality, as if no COVID at all. China can also be classified in this group, as the COVID impact was very marginal outside of the Wuhan region.
Conclusions and first hints about 2021
The data above is obviously telling us about mortality in general, but in the absence of other pandemics or special events, we can reasonably assume that the excess mortality comes from the COVID epidemics, that in most countries had a larger impact than the usual flu. We also should keep in mind that some of those COVID related deaths are from people infected, but also that some deaths are due to indirect consequences of the measures taken, like delayed treatments and operations, stress during lock-downs and more.
Unfortunately, a number of interesting countries are missing in the database, like Romania. However the existing results tend to show that the impact of COVID largely depends on the countries and roughly 4 groups of countries can be identified. The interesting fact is that those groups also correspond to geographic and/or ethnic groups:
– in Continental Northern Europe, Western and Central Europe, the big majority of COVID casualties are in the age range above 65, while younger categories are almost untouched
– in the UK, Southern and Eastern Europe including Russia, the impact on 15-64 years range is less than on elderly people, but is still significant in 2020, and even growing in 2021
– in North America and probably South America (would be interesting to get data on Brazil and Argentina), the impact is similar on all age ranges above 15, and the excess mortality is not much concentrated on peaks, but spread over the year 2020. In 2021, we see a clear evolution where the excess mortality is almost exclusively in the 15-64 range. while elderly people show a mortality deficit!
– few countries show almost no excess mortality in 2020, like Norway, Australia and most of Eastern Asian countries including Japan that has a very high percentage of elderly people, which should make the country very vulnerable if applying the evolution observed in Europe.
– a number of countries do not show any excess mortality in 2021, like Sweden or Switzerland
As a comment, we should be careful about the increase of mortality in some countries for the 15-64 range. A part of that mortality may come from the delayed results of other illnesses that remained untreated, or treated too late due to the pandemics peaks in 2020. Additional dead may also be victims of extreme stress situation, for example during lock-downs. Only a more detailed survey including cause of death could provide a better understanding of the numbers.
Now, what are the reasons behind such differences is a much more complex question, and shall require detailed studies. Many causes can be put on the table:
– percentage of obese in the population (Americas in particular)
– overall health situation, including junk food popularity and use of GMO
– state of the medical system, including insurance coverage and role given to prevention and prophylactics (Asia)
– environmental criteria, in particular air pollution
– dominant COVID variant in each country at a given time
– genetic differences between racial groups
– efficiency or failure of sanitary measures taken by local governments (Asia)
– finally, and as noted earlier, a part of the deaths certainly come from lack of timely treatment of diseases like cancer, due to an overload of the medical system in 2020.
All in all, better not trust the data displayed by major medias, be critical and do not panic. If you feel uncertain, look back at what your government or medias said in early 2020, and enjoy the comic of it!