
Distinguishing ‘significant’ from ‘meaningful’: thinking about p-values in applied data work.
By izzy Thornton
It’s an easy trap to fall into: you run a laundry list of statistical analyses and immediately start scrolling through p-values to see what’s worth reporting. Most statistics packages even calculate these automatically.
Across many disciplines, statistical significance has slowly transformed from a specific kind of inferential calculation into a kind of flag for truth. Readers may be inclined to think a finding is more correct, more truthful, or matters more if it has an asterisk at the end connoting a certain p-value.
But a p-value really only tells us one thing: when analyzing data from a subset, what are the odds that this calculation is random noise? If those odds are very low (say, 0.05 or lower…) then we add an asterisk to show that result is statistically significant.
What this means in practice is that p-values are only useful under the following conditions:
- When there is a whole population of cases AND
- You are analyzing only a selection of them AND
- You’ve calculated a statistic AND
- All you want to know is the likelihood that that statistic happened by chance
Looking at each point, we can determine if p-values are actually valuable for the question you’re dealing with.
1: Are you studying a population?
Sometimes you aren’t trying to draw conclusions about everyone or everything in a group. If you’re looking at a single-subject design or a purposive sample, for instance, you may not be interested in whether your results are reflective of a whole group. If you do want to know about a whole group (like every class meeting in a semester, every duck in a pond, every portfolio in a brokerage, every participant in a trial) then carry on.
2: Do you have data from only a selection of the group?
If you have data from every member of the population and you aren’t partitioning the data according to some variable, then you do not report statistical significance. On a grand scale, for example, descriptive statistics from decennial census data should not have p-values reported. On a smaller scale, you should not report p-values if you are analyzing, for instance, every student in a class or every product from an assembly line. If you have collected your data from a sample or if you’ve broken the population up into groups (eg: students in the intervention group vs students in the control group, or products made by first, second, or third shift) then carry on.
3: Have you calculated a statistic?
If you have qualitative data, you are not dealing with statistical significance. If you eyeballed a trend on a graph but haven’t quantified it somehow, you aren’t dealing with statistical significance. If you have a descriptive finding about the whole population (eg: median income for the whole county) then you don’t have a statistic, you have a parameter and you are not dealing with statistical significance.
4: Are you trying to determine if that statistic is the result of random chance?
This is perhaps the most common pitfall in significance testing. Significance testing helps you determine if the statistic you calculated is so distinct that it’s really unlikely to happen just because of random group assignment.
In a hypothesis testing, statistical significance has us reject the null hypothesis – it does not have us accept any other hypothesis. It is only claiming “the odds are that some kind of relationship exists between these variables, because if it didn’t, these data would look more random than they do.” In inferential statistics, statistical significance has us infer to the broader group, as if to say “there are really low odds of this finding happening only in the specific group we randomly picked and not also happening in the broader group we picked from.”
This is where the interpretation gets tricky. Just because there is statistical significance, doesn’t mean the finding is meaningful for what you’re studying. Effect size, indirect effects, interaction effects, and dozens of other considerations shape the actual answer to your research or evaluation questions. Are you ready to conclude that the intervention is effective, even if its effect size is incredibly tiny? Can you say for sure that a disparity exists in the population just because two variables are consistently related? Maybe not just yet.
Significance testing is useful and, in some cases, mission critical. It’s only one tool, though, in a whole set of tools for discerning your findings from your conclusions. Statistical significance alone can’t serve as a yes/no answer to complex evaluation and research questions, but it can act as a…guiding star…for where to look next.