Probability is starting with an animal, and deciding what footprints it’ll make.
Statistics is seeing a footprint, and guessing the animal.
Probability is straightforward: you’ve got the bear. Measure the foot size, the leg length, and you’ll deduce the footprints. “Oh, Mr. Bubbles weighs 400lbs and has 3-foot legs, and can make tracks like this.” More academically: “We have a good coin. After 10 flips, here are the possible outcomes.”
Statistics is harder. We measure the footprints and need to guess what animal it might be. A bear? A human? If we get 6 heads and 4 tails, what’re the probabilities of a good coin?
The Usual Suspects
Here’s how we “find the animal” with statistics:
Get the tracks. Each bit of knowledge may be a point in “connect the dots”. The more data, the clearer the form (1 spot in connect-the-dots isn’t helpful. One datum makes it hard to seek out a trend.)
Measure the essential characteristics. Every footprint features a depth, width, and height. Every data set features a mean, median, variance, and so on. These universal, generic descriptions provides a rough narrowing: “The footprint is 6 inches wide: a little bear, or an outsized man?”
Find the species. There are dozens of possible animals (probability distributions) to think about . We narrow it down with prior knowledge of the system. within the woods? Think horses, not zebras. handling yes/no questions? Consider a Bernoulli distribution .
Look up the precise animal. Once we’ve the distribution (“bears”), we glance up our generic measurements during a table. “A 6-inch wide, 2-inch deep pawprint is presumably a 3-year-old, 400-lbs bear”. The lookup table is generated from the probability distribution, i.e. making measurements when the animal is within the zoo.
Make additional predictions. Once we all know the animal, we will predict future behavior and other traits (“According to our calculations, Mr. Bubbles will poop within the woods.”). Statistics helps us get information about the origin of the info , from the info itself.
Ok! The metaphor isn’t perfect, but more palatable than “Statistics is that the study of the gathering , organization, analysis, and interpretation of data”. Need proof? let’s examine if we will ask intuitive “I tasted it!” questions:
What are the foremost common species? (Common distributions)
Are new ones being discovered?
Can we predict subsequent footprint? (Extrapolation)
Are the tracks following a path? (Regression / trend line)
Here’s two tracks, which animal was faster? Bigger? (Data from two drug trials: which was more effective?)
Is one animal occupation an equivalent direction as another? (Correlation)
Are two animals tracking a standard source? (Causation: two bears chasing an equivalent rabbit)