A sampling distribution is a probability distribution of a statistic obtained through a large number of samples taken from a specific population. The sampling distribution of a given population is the distribution of the frequencies of a range of different results that could possibly occur for a population statistic.

Understanding the sampling distribution

Many data processed and used by academics, statisticians, researchers, marketing, analysts, etc. are actually samples, not populations. A sample is a subset of a population. For example, a medical researcher who wants to compare the average weight of all children born in North America from 1995 to 2005 with those born in South America in the same period of time cannot, within a reasonable period of time, extract data for the entire population of more than one million births over ten years. Instead, he will use only the weight of, say, 100 children on each continent to draw a conclusion. The weight of 200 children used is the sample and the calculated average weight is the average of the sample.

Now suppose that instead of taking just one sample of 100 baby weights from each continent, the medical researcher repeatedly takes random samples from the general population and averages the sample for each sample group. Thus, for North America, he extracts data from 100 infant weights recorded in the United States, Canada, and Mexico as follows: four 100 samples from selected hospitals in the United States, five 70 samples from Canada, and three 150 records from Mexico, for a total of 1200 infant weights grouped into 12 groups. It also collects a data sample of 100 birth weights from each of the 12 South American countries.

The average weight calculated for each set of samples is the sampling distribution of the average. Not only can the average be calculated from a sample. Other statistics, such as standard deviation, variance, proportion and range, can be calculated from the sample data. The standard deviation and the variance measure the variability of the sampling distribution.

The number of observations in a population, the number of observations in a sample and the procedure used to draw the sample sets determine the variability of a sampling distribution. The standard deviation of a sampling distribution is called the standard error. While the average of a sampling distribution is equal to the population average, the standard error depends on the population standard deviation, population size and sample size.

Knowing how different the average of each of the sample sets is from the others and the population average will give an indication of how close the sample average is to the population average. The standard error of the sample distribution decreases as the sample size increases.

Special considerations

A population or series of sample numbers will have a normal distribution. However, since a sampling distribution comprises several sets of observations, it will not necessarily have a bell curve shape.

Following our example, the average weight of the children’s population in North America and South America has a normal distribution because some children will be underweight (below average) or overweight (above average), with most children in between (around average). If the average weight of infants in North America is seven pounds, the average sample weight in each of the 12 series of sample observations recorded for North America will also be close to seven pounds.

However, if you plot the graph of each of the averages calculated in each of the 1,200 sample groups, the resulting shape may result in a uniform distribution, but it is difficult to predict with certainty what the actual shape will be. The more samples the researcher uses samples from a population of over one million digits in weight, the more the graph will begin to form a normal distribution.