P-values are frequently misinterpreted, which causes many problems. I won’t rehash those problems here here since my colleague Jim Frost has detailed the problems involved at some length, but the very fact remains that the p-value will still be one among the foremost frequently used tools for deciding if a result’s statistically significant.
You know the old saw about “Lies, damned lies, and statistics,” right? It rings true because statistics really is the maximum amount about interpretation and presentation because it is mathematics. Meaning we citizenry who are analyzing data, with all our foibles and failings, have the chance to shade and shadow the way results get reported.
While I generally wish to believe that folks want to be honest and objective—especially smart people that do research and analyze data which will affect other people’s lives—here are 500 pieces of evidence that fly within the face of that belief.
We’ll revisit thereto during a minute. But first, a fast review…
What’s a P-Value, and the way Do I Interpret It?
Most folks first encounter p-values once we conduct simple hypothesis tests, although they are also integral to several more sophisticated methods. Let’s use Minitab Statistical Software to try to to a fast review of how they work (if you would like to follow along and do not have Minitab, the complete package is out there free for 30 days). We’re getting to compare fuel consumption for 2 different sorts of furnaces to ascertain if there is a difference between their means.
Go to File > Open Worksheet, and click on the “Look in Minitab Sample Data Folder” button. Open the sample data set named Furnace.mtw, and choose Stat > Basic Statistics > 2 Sample t… from the menu. Within the panel , enter “BTU.In” for Samples, and enter “Damper” for Sample IDs.
Press OK and Minitab returns the subsequent output, during which I’ve highlighted the p-value.
In the majority of analyses, an alpha of 0.05 is employed because the cutoff for significance. If the p-value is a smaller amount than 0.05, we reject the null hypothesis that there is no difference between the means and conclude that a big difference does exist. If the p-value is larger than 0.05, we cannot conclude that a big difference exists.
That’s pretty straightforward, right? Below 0.05, significant. Over 0.05, not significant.
“Missed It By That Much!”
In the example above, the result’s clear: a p-value of 0.7 is such a lot above 0.05 that you simply can’t apply any illusion to the results. But what if your p-value is basically , really on the brink of 0.05?
Like, what if you had a p-value of 0.06?
That’s not significant.
Oh. Okay, what about 0.055?
How about 0.051?
It’s still not statistically significant, and data analysts shouldn’t attempt to pretend otherwise. A p-value isn’t a negotiation: if p > 0.05, the results aren’t significant. Period.
So, what should I say once I get a p-value that’s above 0.05?
How about saying this? “The results weren’t statistically significant.” If that is what the info tell you, there’s nothing wrong with saying so.
No Matter How Thin You Slice It, It’s Still Baloney.
Which brings me back to the blog post I referenced at the start . Do provides it a read, but rock bottom line is that the author cataloged 500 alternative ways that contributors to scientific journals have used language to obscure their results (or lack thereof).
As a student of language, I confess I find the list fascinating…but also upsetting. it isn’t right: These contributors are educated people that certainly understand A) what a p-value above 0.05 signifies, and B) that manipulating words to melt that result’s deliberately deceptive. Or, to place it in words that are less soft, it is a damned lie.
Nonetheless, it happens frequently.
Here are just a couple of of my favorites of the five hundred alternative ways people have reported results that weren’t significant, amid the p-values to which these creative interpretations applied:
A certain trend toward significance (p=0.08)
approached the borderline of significance (p=0.07)
At the margin of statistical significance