Statistics tell stories. And people love stories.
That’s why writers like to include statistics in their work—a statistic can capture an interesting fact in a quantifiable, snackable, contagious format. Like when I tell you that in a 2014 study, 11% of people thought the coding language HTML was an STD.
The element of flair and credibility statistics add to your work can be both good and bad. It all depends on how accurate your statistics are.
If you use a statistic incorrectly, or if the statistic was wrong to begin with, you could be misleading readers. I’ve had my share of snafus when it comes to referencing statistics—for which I have been corrected. And I hope to keep it that way.
But for writers, navigating statspeak is no easy feat. How fortunate that last semester I took a statistics class :). I learned a lot of relevant stuff that writers should know. So here are some things to watch out for when referencing statistics so you don’t end up with your keyboard in your mouth.
1. Know Your Percents
Something I have been corrected for is confusing a change in percent with a change in percentage points. You may have made the same mistake. Let’s consider a hypothetical study to tease out our errors.
Suppose scientists are trying to determine whether carrying a sugar glider around in a necklace pouch is good for mental health. Our imaginary scientists find that 82% of people who spent a day with a sugar glider rated themselves as happy, compared to 50% of people who didn’t stroll about with a cute mammal hanging around their neck. (Let’s assume the researchers finished all their fancy calculations of statistical significance and everything is in the clear).
In this case, would it be accurate to say that someone with a sugar glider would be 32% happier?
No. These percentages are indicative of probabilities. Essentially, the scientists are saying that, based on this experiment, someone who carries a sugar glider has a higher probability of rating themselves as happy (82% probability vs. 50%).
So because we’re talking about probabilities, we can say there’s a 32% increase in a person’s likelihood of being happy if they carry a sugar glider around?
“What?!” Let me explain.
There’s a difference between a percentage increase and an increase in percentage points. An increase in percentage points is more intuitive—in this case, we saw an increase of 32 percentage points.
A percentage increase, on the other hand, refers to the change from a measure of the original state (in this case, a proportion, 50%) to a new value (82%). Assuming we’re saying that the original likelihood of being happy was 50% (for those sad people without the sugar glider), an increase in 32 percentage points is actually a 64% increase, since 32% is 64% of 50%.
A lot of gobbledygook, I know. Here’s a better way to think about it—not in terms of changes in proportions (e.g. 50% said they were happy), but changes in quantitative variables (e.g. each person rated their happiness on a scale of 1-10).
If the average happiness rating (given on a scale of 1-10) went from 6 to 9 when carrying a sugar glider, the change in 3 would be an increase of 3 rating points, representing a 50% increase in happiness ratings (the change / original measurement ⇒ 3/6 = 0.5 or 50%).
In this case, it would be accurate to say that on average people who carried a sugar glider around were 50% happier because the measurements quantified happiness. And in this case, a 50% increase makes more intuitive sense.
2. Don’t Quote the Quoter
Statistics are interesting, so we like to include them in our writing. So we jump on Google and search “x statistics” to try and find interesting information about x topic. This is a perfectly legitimate practice, but there’s something you need to watch out for.
I call it: Quoting the quoter.
Other writers just like you reference statistics in their work. When you do your nice little Google search, you’ll likely come up with some of their work, sprinkled delicately with a few statistics here or there. While the writer’s statistics may be accurate, you should avoid referencing them directly. Here are a couple reasons why.
1. The telephone effect
Statistics are powerful within context, but as you move them away from that context, they can lose some of their meaning. Think of the telephone game we play in elementary school: As a basic phrase gets passed around, it gets unintentionally (or sometimes intentionally) changed, resulting in an amusing new phrase by the end.
With statistics, the change isn’t so amusing.
Like a game of telephone, the further you move away from the original source, the more likely you are to get an eschewed version of the original statistic.
2. The statistics could be outdated
It’s rare to see dates posted next to statistics. So if you’re quoting the quoter, you run the risk of being on the end of a daisy-chain of quoters quoting quoters. As a result, you could be referencing statistics that are outdated and have since been debunked. An article published in 2018 doesn’t mean its statistics are from the same year.
Here’s an example on Forbes of a 2018 article referencing a statistic that they say was from 2017, but the article they link to is from 2015.
If you must quote the quoter, try to be only one step away from the data source. This might be your best bet because the actual study might only be raw data, and then someone else takes the data and works with it. Since you’re a writer, not a statistician, that’s probably your best source of truth.
3. Positive Bias
Do vaccines cause autism? Some say yes, some say no. What do scientists say? NO.
Why? Because of positive bias.
Positive bias refers to the “the preference for publishing research that has a positive (eventful) outcome, than an uneventful or negative outcome.” Publishers and readers want to share things that seem eventful. Because of this preference, positive bias can create a sharing loop in which information that is eventful—while maybe not true—can gain a lot of publicity quickly.
So writers need to be careful. Especially when writing for a client with an agenda, it can be easy to only seek out statistics that match our message (which is exactly what former—emphasis on “former”—researcher Andrew Wakefield did in his autism study).
This is a dangerous game to play. Everyone has the right to make an argument, but an ethical writer will either consider various studies or at least qualify their statements before making a bold claim with potentially serious consequences.
4. Statistics Can Be Misleading
Torture the data and it will tell you anything you want. Writers need to watch out for potential misuses of data where the statistic might be telling the truth, but the effect could be totally misleading. Here are some examples.
The beautiful thing about statistics is that you don’t need to get information from every individual in a population to know the characteristics of a population. Instead, you can take a sample and make inferences about the population based on that sample.
The trick here is that a sample should be relatively large and chosen at random in order to be representative. Below are some wrong ways to sample that you should watch out for.
The wrong way to sample
Convenience sampling. Using a company’s email list for a survey is an example of a convenience sample. This doesn’t mean that you can’t use an email list to get some insights into things like consumer preferences. But if you plan on publishing this information, you need make clear the scope of the study.
For example, if 51% of the respondents to your company survey say they prefer chocolate ice cream, you can’t say that 51% of all people prefer chocolate. Just your audience.
Quota sampling. Like convenience sampling, quota sampling occurs when a researcher doesn’t select subjects at random. Instead, the researcher selects the first that come through the door and (falsely) assumes that this will produce a representative sample.
Volunteer sampling. When people volunteer for a study, you need to be wary of the results. People that volunteer information can be very different from people that don’t, so your sample might not be representative.
If you ever come across any of these issues with sampling, discount the information.
Means vs. Medians
Means and medians are grade school topics. Yet these simple concepts are some of the best ways to mislead readers, and are often used to do so intentionally.
As a quick recap, means and medians indicate the centers of a data set. A mean is an average: add up all the values of a specific characteristic of each individual in a group and divide by the number of individuals. The median is the middle value: put everyone in order based on a quantitative variable then find the person at the exact middle.
The way your data is distributed will determine whether the mean or median is more appropriate for measuring the center.
Distributions: Normal, Right-Skewed, Left-Skewed
A normal distribution is your standard bell curve, where most data points are clustered around the mean and then they falter off the farther you move from the mean.
When data is normally distributed, either the mean or the median will be a good measure of the center (because they should be roughly the same). But when data isn’t normally distributed (meaning it’s right-skewed or left-skewed), you need to watch out.
A right-skewed data set is one that has most of its data clustered on the left with some data far to the right skewing the mean towards a higher value.
A good example of right-skewed data is income in America. Most people make less than $100,000, but some make more, and a select few make much, much more. Because of the outliers, using the mean wouldn’t give an accurate measure of the center, so the median is preferable. In income reports, the US Census Bureau doesn’t even offer the mean, just the median (in 2016 it was $59,039). A mean income in this case would be misleading.
A left-skewed data set is one that has most of its data clustered on the right with some data far to the left skewing the mean towards a lower value.
An example of left-skewed data would be age at retirement. People tend to retire more as they get older, but there are those lucky few who make their fortunes early and retire early to their private island in the Caribbean. In the US, the difference between the mean and median retirement age isn’t much different: the mean retirement age is 59.88, and the median is 62. Close, but still different stories.
(Note: above is an example of a 1st degree quoting of the quoter, where I relied on someone else to look at the raw data and do their own evaluations ;)).
5. Studies vs. Experiments
You’ve likely heard the statement (perhaps in the political arena) that “correlation does not imply causation.” What exactly does this mean?
When researchers conclude that two quantitative variables tend to fluctuate together, this relationship is considered a correlation. For example, ice cream sales tend to have a positive correlation with temperature—the higher the temperature, the more ice cream sales.
This simple example might imply that greater temperatures cause greater ice cream sales (which is probably the case). But a statistician can’t say that with just a study of data. Just because two variables correlate does not mean that one causes the other.
For example, the number of people who drowned by falling into a pool correlates with the number of films Nicolas Cage appeared in each year:
Do you think you could argue that one caused the other? Probably not.
But there are less obvious examples. One article (incorrectly) referenced a study that found a correlation between teacher pay and student performance, arguing that raising pay would cause an increase in teacher performance.
While this may be true, it may be not—because correlation does not imply causation. Couldn’t it be possible that there are some unknown variables in there—perhaps that teachers that get paid more live in more advanced countries, and therefore are more likely to teach better students? Something to think about. (Not to mention that this article was a textbook example of quoting the quoter. Just saying.)
6. Bad survey structures
Bad survey structures can also create misleading statistics. One well-known culprit here is Colgate.
One Colgate advertisement stated that 80% of dentists recommend Colgate. That’s a pretty impressive number—so only 20% of dentists recommend a toothpaste other than Colgate?
Nope. The way the survey was conducted, dentists were allowed to choose multiple toothpastes they would recommend. So a dentist that recommended Colgate could also recommend Crest, Aquafresh, Pepsodent, Aim, and any other.
80% of dentists would probably recommend just about any product that was good for you, so it’s not a very helpful number. Completely accurate, just not helpful. When possible, be wary of how surveys are conducted.
7. Misleading wording
The way a statistic is worded can make it misleading without being false.
An example is a quote by Nick Schroer of the FBI, stating that the 2016 FBI Uniform Crime Report showed that “more than four times as many people were stabbed to death than were killed with rifles of any kind.” I’m not sure about the numbers here (because I’m quoting the quoter), but regardless, the wording is worth looking at.
First impressions? Unless you picked up on the subtle difference between “rifles of any kind” and “guns,” you may have interpreted the statements as saying that people are less likely to die from gun violence than from stabbings. And by extension, you might conclude that we should refocus efforts on curbing knife violence instead of gun violence, which, if you’re going by the numbers alone, wouldn’t be a reasonable decision.
Get Your Stats Straight
This may have been more stats than you were ever hoping to read as a writer. But I hope the message got through: Stats are powerful, either for good or not so good.
Use them well, not just because you want to make your content look good or make a point.
Use them wrong, and it could backfire.
Comment below if you’ve seen any bad examples of statistics. I’d love to see them!