A box plot or box and whisker plot help you display the database distribution on a five-number summary. The first quartile Q1 will be the minimum, the third quartile Q3 will be the median, and the fifth quartile Q5 will be the maximum. You can find the outliers and their values by using a box plot. You can also understand if your data is symmetrical or not, and tight or loose, in a group or if you have skewed data.
What is a box plot?
Box and whisker plot includes lines and boxes to divide the data into different numeric groups. 50% of central data will come around the central line of the box. This value is the median. The lines will capture the remaining data by extending from each box. The dotted lines you place around the line edges will be the outliers.
Some Important Terms you should know:
The minimum score is the lowest score and does not include outliers. This column is at the end of the left whisker.
In the lower quartile value, you will find twenty-five percent of the scores. This is the first quartile.
The median is the data’s mid-point. You will represent it with a line, dividing the box into two halves. You can also call this the second quartile. Half scores are less than the median, and half are greater or equal.
Below the upper or third quartile value lies seventy-five percent of the data. The remaining data, i.e., 25% of the data, will remain above the value.
You will find the higher score at the end of the right whisker. This section will not exclude outliers.
The upper 25% and the lower 25% of the scores will represent scores outside the center 50%.
The Interquartile Range (or IQR)
The Interquartile Range of the box plot will show the middle 50%. This middle-range includes data from 25 percent to 75 percent.
Box Plot Example: Finding the five-number summary
Here are the weights of the sample of 101010 boxes of raisins. The unit measure would be grams. You need to find the five-number summary of these boxes of raisin.
303030, 292929, 373737, 353535, 383838, 373737, 353535, 282828, 252525, 343434
Make a Box Plot of the Data
Arrange all the data points starting from the smallest to the largest. So we can start by arranging the data:
252525, 282828, 292929, 303030, 343434, 353535, 353535, 373737, 373737, 383838
Now, you need to find the median. Keeping it simple, the median is the middle two numbers. So our median in this data would be:
252525, 282828, 292929, 303030, “large (34)3434”, “large (35)3535”, 353535, 373737, 373737, 383838
=32+34 / 2 =32
This means the median would be 323232
You need to find the quartiles. You will consider the first quartile as the data points’ median. You will start from the median’s left.
252525, 282828, \large(29)2929, 303030
Q_1 = 29
Q1 = 29Q
The starting subscript is 1, and the end subscript will be 29.
The data point’s median is the third quartile, and the position will be at the right of the median.
343434, 353535, 353535, \large(37)3737, 373737, 383838
The starting subscript is three, and the end subscript will be 37.
Now, complete the five-number summary and find the maximum and minimum value of the box.
The smallest data point will be the minimum value. In the above box plot example, that value will be 252525.
The most extensive data point will be the maximum value. In the above box plot, that value will be 383838.
Hence the five-number summary is:
252525, 292929, 323232, 373737, 383838
Comparison of the Box and Whisker Plot
A box and whisker plot enables you to visualize differences between various groups and samples. You can receive substantial statistical information by comparing the box and whisker plot, such as outliers, ranges, and medians.
Step 1: Comparing the Medians
You need to compare medians of individual boxes. If the median line is not inside the box, the two groups are different.
Step 2: Comparing the Whiskers and Interquartile Ranges of Box Plots
You need to compare the box lengths of the interquartile ranges. This way, you can analyze the data and how it disperses between the samples. The dispersion of the data depends on the length of the box. On the other end, dispersion will be limited with a small number of data.
You need to check the overall spread between two whiskers as it indicates extreme values. Furthermore, it will show the range of scores, which is another type of dispersion. When the ranges are extensive, you discover wider distribution. This scatters the data even more.
Step 3: Looking for the Potential Outliers
When performing a box plot review, you consider the outlier as the data point. The location of this data point will be outside the whiskers.
Step 4: Looking for Signs of Skewness
Now you need to look for the appearance of the data. Check if it is symmetric or not. Go through each sample and find the same kind of asymmetry.
With the help of a boxplot, you can show a five-number summary in the chart. The main purpose of the chart is to show the middle part of the data. This middle portion is the interquartile range. You will find the first quartile at the end of the box at the 25% mark, and at the 75% mark, you will find the third quartile.
You will add the minimum at the chart’s five left regions. This appears on the left whisker at the end. The minimum is the smallest number, while the maximum, which is at the far right, is the largest number. In the center of the box, you will find the median. You need to look at the vertical bar to find the median. You won’t use the box and whisker plot a lot in real life. Nevertheless, you can use the tool to find a quick summary of the data.