Coursera Learner working on a presentation with Coursera logo and

What is the Empirical Rule?

Coursera Learner working on a presentation with Coursera logo and

This rule in statistics suggests that every data that you can observe will fall under three different standard deviations of the mean in a normal distribution. You might also know the empirical rule as the 68–95–99.7 rule or three-sigma rule. According to the rule, 68% of the data will fall in the first standard deviation, 95% will fall in the first and the second deviation and 99.7% of the data will fall in all three deviations:

68% – (µ ± σ),

95% – (µ ± 2σ)

99.7% – (µ ± 3σ)

If we have a normal distribution of the data on a graph on the x-axis, the bell curve will be in the center. The first standard deviation includes the positive half (µ + σ) and the negative half (µ – σ). Both these halves of the first standard deviation collectively will be 68%, but if we only consider the positive half, it would be 34%, and the negative half would be the same. Similarly, if we consider the second standard deviation, we can add the positive half of the first and second deviation with the negative side of both deviations, making it 95% complete. The phenomena will be the same in the third deviation also.

Normal Distribution

This is probably an essential distribution of probabilities in statistics. For instance, data sets like heart rate, blood pressure, height, and IQ scores will form a bell curve of normal distribution 

The symmetry of the Normal Distribution

The normal distribution is for continuous variables. Continuous variables have infinite values. They include these values in the distribution. A normal distribution helps in describing the way in which you distribute the variables. Most variables, data, or observations cluster to the center in a normal distribution, causing a peak. That is why most normal distributions have a bell shape.

Furthermore, in a normal distribution, the mean, median, and mode are equal. There is a curve in the center, which is the mean. However, the left and the right values are equal. You can define normal distribution by the mean and standard deviation. These are the two essential factors that affect the curve. 68 percentage of the area falls under a single standard deviation of the mean.

Parameters of Normal Distribution

Mean

We can find the mean of the data set by adding all the values and dividing the total by the number of values.

Median

When you order the data set from the lowest to the largest, the middle value is the median.

Mode

The mode is that value that appears very often in the data set.

Standard Deviation

Standard deviation measures how widespread are the values of the data are. The symbol of the standard deviation is sigma. The standard deviation is only the square root of the variances. For instance, when you measure the annual rate of the investment return, you can find the historical volatility of the investment. This approach is a statistical measurement or standard deviation.

Variances

Variances also measure the widespread of values. However, this term refers to how far the numbers in the data set from the mean and other numbers are

Z-Scores

Z-score is a numeric representation of the relationship between the mean of a group to the value. You can calculate Z-score as standard deviations with the help of the mean. When the Z-score equals zero, the means score and data score are equal. Z-score can be negative and positive. If a Z-score is negative, it is below mean, and if the Z-score is above mean, it is positive.

Understanding the Concept of68–95–99.7 Rule

The normal distribution of the data commonly relates to the 68–95–99.7 rule. You can find 68% of the data in the first standard deviation, 95% of the data in the second deviation, and 99.7% of the data in the third deviation of the mean.

Probability Density Function

To find out the percentage, you should know what the probability density function or PDF means. With the help of PDF, you can specify the random variable probability that falls in a specific range of values instead of taking any different value. You can calculate the probability by taking out the integral of the variable’s PDF on the range. This means the area is in the density function but between the highest and lowest values and over the range’s horizontal axis.

In the first standard deviation, there is 68% of the data. So if you want to find the random data point landing’s probability in the first standard deviation, you need to calculate the mean of the data from -1 to 1 standard deviation.

In the second standard deviation, there is 95% of the data. So if you want to find the random data point landing’s probability in the second standard deviation, you need to calculate the mean of the data from -2 to 2 standard deviations.

The third standard deviation has 99.7% of the data. So if you want to find the random data point landing’s probability in the third standard deviation, you need to calculate the mean of the data from -3 to 3 standard deviations.

Conclusion

We can get a rough probability estimation of the data through the 68–95–99.7 rule quickly. You can use this method as a simple test when the population of the data is normal. However, if the data population is not normal, you can use this method as a normality test.