Summary of “How to Lie with Statistics” by Darrell Huff

This book tells us about how we can be tricked using different tools of statistics. We can see statistics in our daily life like in toothpaste advertisement, in company’s annual reports, in different kind of surveys etc. But if we don’t know the exact essence of this data, we can be easily fooled by this data. This book gives an idea on how to interpret this data in a more correct sense.

Sample with the Built-in-Bias

When a data is studied in statistics it is based on a sample. Now, this sample can be anything. Without knowing the details of actual sample, if we interpret this data we can go wrong. The best example for this can be the average income. In many different surveys we can see this line “The average income of this group of people is Rs. X”. If we blindly follow this data then we can be wrong because we don’t have knowledge about the sample like who participated, were they of same group, and the biggest gamble we play here is we believe they aren’t lying. If they are lying then the data is of no use. In the same way if the sample does not comprise appropriate group of people then it can mislead anyone reading this data. Author has given different example to explain this. We can take another example of marks of students. If one wants to prove that a particular class has higher average marks than others then he will include only those students in the sample who have good scores.

Different kind of Averages

When anyone talks about average most people think that it is about simple average. But wait, there are two more averages i.e. Median and Mode. The same data with no change can be shown in three different forms. Mean will depict a story, median will depict another and in the same way mode will have a different story. Statisticians or say anyone presenting a data will use an average which will best show his data. Although meaning of these three averages is different but overall they all are averages.

Here we can take an example of marks of students. The mean, median and mode can be different here depending on the marks of the students. So on asking what the average marks of the students is, it is necessary that one knows about the average used.

Missing the little ones

While studying data in statistics there are a lot of factors to see upon. These figures could be small also which many people ignores. And here they are tricked. Suppose a company wants to show that the product which it is manufacturing is effective, it will conduct surveys and will ask people to use those products and later give their results. As the company wants the result to be effective, it may happen that survey is conducted on a very small number of people and the results turn out to be in favour of company. Even if things go wrong they can conduct the surveys again because small surveys doesn’t cost too much. We can take another example of tossing a coin. If we toss a coin ten times, it may be that eight times head appears and the probability of head coming up is 80%. But if the coin is tossed 100 times then the result may wary. One should also not a follow number blindly. Say if you went for camping and you selected the place for camping by seeing its average temperature. In this case it is necessary that the range must be focused upon. It can be that the temperature ranges from very low to very high. So missing these small but important points can mislead anyone easily as these points are not given that importance.

Ignoring the Errors

While calculating any data point in statistics there could be some or the other thing which isn’t considered. Due to this the data point obtained can’t be trusted because there could be an error in this. Say for example, while conducting surveys it is not necessary that everyone is telling the truth. So there can always be a margin of error. This error can be due to ignorance of qualities, or people lying etc. While calculating IQ of students, qualitative figures are ignored like leadership skills, creative skills etc. So this IQ could also have error. And hence error should be considered while studying any data point.

Playing with the Graphs

Many people use charts or graphs to present the data in a much better way. Like if there is any trend it can be observed from it. But this chart can be easily manipulated. The data it is depicting will be correct but if the way of presentation is changed then the story can be changed. Let’s take an example. A person wants to show the increase in cases of a particular disease. Say for example he is showing increase over a period of 1 year. Cases started from zero and went up to 1000 at the end of the year. Now on the ‘x’ axis he will plot months and on ‘y’ axis he will plot cases. If he takes the scale on ‘y’ axis as of 100 cases on 1 centimetre the observer will see that the hike in cases was high. But if the scale is taken in thousands then it will look like there was not a bigger hike. In this play, charts can be manipulated.

One – Dimensional Picture

Apart from line charts, bar charts can also be used to manipulate the way of presentation. Let’s take an example. You are showing the number of corona virus patients over two different time periods. Let’s say the cases have doubled in this period. In first bar the cases were 1500 and in next bar the cases are 3000. As the cases have doubled the second bar would also be double in size. So the viewer will get that the cases have doubled. But if this same thing is applied in pictorials too, it will depict a whole new story. Author took example of cows to explain this. If the number of cows in a country has doubled and the pictorial shows two cows in which the second one is almost double in size then a person will think that the size of the cows is increased and not their population.

Semi – Attached Figure

In statistics, there are a numerous methods to misinterpret data. One such method is semi attached figures. The figures counted and the ones which are reported sounds the same but is not the same. Say for example, a report shows that “X” number of people were dead in rail accidents. People will believe that all this persons were travelling in the rail. But this figure also includes the people who were in their car or two-wheeler and had an accident with rail. So the number sounds the same that this many people were killed in rail accident but it is not the same. This semi attached figures can easily mislead anyone. In advertisements also this kind of numbers are shown. Like any chips packet with a label 10% extra. Extra of what! So one must be aware of this kind of trick and should not fall for it. A number can be presented in many ways and hence its actual jest may not be able to grasp by readers.

Post Hoc Rides Again

There are a lot of data where the person presenting it would have used correlation between different things. On seeing this correlation for the first time, it would look like there’s no problem in this i.e. it is appropriate. Having a correlation means one factor is responsible for the happening of other factor. But here, the correlation can be wrong also. Say for example, a study shows that people who smoke tend to have fewer score in test. Now it can be seen in both ways, as the student is scoring low, that’s why he smokes or as he is smoking very often that’s why he scores less. So there could be a wrong interpretation of the data. There may be a few cases where coincidently there’s a correlation.

Statistical Manipulation

Author is focusing on the ways through which data can be manipulated. He shares various examples to prove this. Manipulation can be done by using different kind of averages i.e. mean, median and mode. Different surveys can give you different averages for same data by using different averages. Percentage can also be used to mislead readers. Like if one person is calculating percentage for a certain data and if he wants to show the data to be higher than he can play with the base while calculating percentage. Double sided charts can also be used to mislead readers.

How to Talk Back to a Statistic

As there are many ways in which the statisticians can lie to us or mislead us. We can test it till some point. There are few questions with the help of which we can come to a conclusion on is the data believable or not.

  1. Who Say’s So?

Check for conscious bias and unconscious bias. Data could be suppressed for showing the specific result or units can be transferred. Charts can be manipulated due to special attention needs to be given here.

  1. How Does He Know?

As the surveys are conducted over a large sample, not everyone participates in it. Check whether the sample is large enough to describe the data precisely. As we saw earlier that there are many correlations also, check whether the correlation is significant or not.

  1. What’s Missing?

Look for the things missing in it. Like if correlation is given then whether measure of reliability is given or not. If average is given then check for which kind of average is given. Also, sometimes the factor is missing which caused a change to occur.

  1. Did Somebody Change the Subject?

Check if the subject is changing or not. Like for example, the definition of the subject changes. Earlier it meant something and now it has changed. Author has given example of farms. The number of farms was increased. This was due to the change in the definition of the farms in which there were lesser farms qualifying for that definition.

  1. Does it Make Sense?

Many numbers are just assumptions or it is derived from a formula which is not precise. In this case it is just a number and not any average.

Thus, Darrell Huff has tried to give an idea on how we (readers) can be fooled using different kind of statistical tools and how we can tackle them and interpret in a more smarter way.

Leave a comment