# What's important in statistics

## News detail view

Mr. Dickhaus, these days there is often talk of exponential growth in connection with the spread of the coronavirus. What does that actually mean?

Thorsten Dickhaus: This means that an initial value is multiplied by the same factor at the same intervals. As far as the virus is concerned, it means that the number of infected people can be expected to increase dramatically and rapidly, the number of which is increasing steeply from day to day. This development differs from linear growth, in which the size of the increase always remains constant, for example always at plus 100.

The legend of the chessboard and the grain of rice illustrates this growth? A king grants a subject a wish. He wants a grain of rice to be placed on a chessboard field and the number of grains of rice to be doubled for each additional field. Two grains of rice on the second, four grains of rice on the third, and so on. The king agrees because he cannot see what that means. In the 21st field there are already over a million grains of rice.

This is exactly what the exponential growth illustrates. In the spring, I also looked at the data published by Johns Hopkins University on the spread of the coronavirus out of scientific interest. You could see the exponential increase there. The stringent measures that were then taken clearly changed the shape of the curve and disrupted the exponential growth.

One of your colleagues, the retired stochastics professor Norbert Henze, speaks in "Spiegel" about the fact that "number blindness and statistical illiteracy" are contrary to understanding of current developments. Without a basic statistical education, you cannot understand the corona statistics, he says, and you may run the risk of having demagogues fool you into a 3 for a 5. What do you make of it?

I don't want to contradict. Developing an understanding of statistics, but above all of coincidences and the calculation of probabilities, is not so easy. This applies not only to laypeople who do not deal with it professionally, but also to scientists and students. In order to develop a feeling for this, for example for fluctuations and uncertainties in certain considerations or model calculations, it takes some experience, routine and a sure instinct.

Do you encounter this experience and routine with those responsible in the current Corona debate?

I think it is very important and right that the Federal Government seek advice from experts who, of course, also make these kinds of calculations in order to forecast future developments. But it is also important for laypeople to be able to deal with numbers, diagrams and statistics in order to be able to assess the situation. The exponential growth that is currently taking place, for example, should ensure that the population understands how quickly the numbers can explode and how close they are to the virus.

Apparently, there are different interpretations of all the numbers that are currently being collected on the virus, by laypeople and experts.

That's right, and that's a problem. If you wanted to form a completely independent judgment, you would have to look at the original data. But you can't ask that of everyone. It is all the more important to have a basic mathematical understanding in order to understand the structures, in order to consider which interactions there can be, how one relates to the other.

The absolute number of cases is an important factor, but the benchmarks are also important. This includes, for example, putting the number of newly infected people in relation to the number of tests. If ten out of 100 tests are positive, the situation is of course more critical than if there are 50 out of 1000 tests. When sample key figures are published, the composition of the sample must be clear, otherwise little can be derived from it.

Your colleague Henke also refers to the so-called Simpson paradox in connection with the data on fatalities in Italy and China. In China, fewer people died in relation to the total population than in Italy. But in Italy the population is older, which distorts the statistics. Can you explain this contradiction, named after a British statistician, for laypeople?

Roughly speaking, you can say that you can get very different and even contradicting results if you differentiate a phenomenon according to subgroups, such as age, or if you look at it across all groups. You have to look closely at all the peculiarities of the subgroups, see what is being compared with what and allow all of this to flow into the overall view.

Do you have a clear example ready for you?

Some time ago in a large German newspaper under the headline “Methusalems make cash: A long course of study pays off in hard cash”, it was claimed that long periods of study, i.e. many semesters of study, would have a positive effect on the graduate's starting salary. In fact, when looking at the overall data set, it looked as if the duration of the study was positively related to the level of the starting salary.

If, however, this overall sample was divided up according to the subjects studied, the correlation within each subject was negative. This means that those who finished each subject in the standard period of study were able to achieve a higher starting salary on average than those who took - in some cases considerably - longer. The solution to this phenomenon, which at first appeared to be paradoxical, was that the general salary level was very different between the subjects examined.

Are citizens currently being given enough background knowledge to be able to do anything with R-factors and incidence values?

Personally, I have the impression that those responsible are finding out exactly how the current figures are to be interpreted. When it comes to communicating with the population, one thing or the other falls by the wayside. It is also difficult to explain these very complex issues in a generally understandable way, especially since it is currently particularly important to make people aware of the seriousness of the situation and to encourage acceptance of the conditions.

Mathematical understanding is of course not only helpful in the current context of the pandemic, but also serves to understand all possible statistics. Do you share the finding that schools can no longer adequately convey this basic understanding to many students?

I cannot make a general assessment of this, but my personal experiences in my environment confirm the finding.

As a mathematician, how do you assess the current situation with regard to the potential for the spread of the virus?

If the situation continues to develop as it has before, one must assume that we will have infection rates in December that no one wants. The increase in the number of cases is currently increasing from one day to the next - this is a sign of the merging into an exponential growth curve.

The interview was conducted by Silke Hellwig.