In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution (though Tukey’s boxplot assumes symmetry for the whiskers and normality for their length). The spacings between the different parts of the box indicate the degree of dispersion (spread) and skewness in the data, and show outliers. In addition to the points themselves, they allow one to visually estimate various L-estimators, notably the interquartile range, midhinge, range, mid-range, and trimean. Box plots can be drawn either horizontally or vertically. Box plots received their name from the box in the middle.
Figure 2. Boxplot with whiskers from minimum to maximum
Figure 3. Same Boxplot with whiskers with maximum 1.5 IQR
Box and whisker plots quartiles, and the band inside the box is always the second quartile (the median). But the ends of the whiskers can represent several possible alternative values, among them:
the minimum and maximum of all of the data (as in figure 2)
the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile (often called the Tukey boxplot) (as in figure 3)
one standard deviation above and below the mean of the data the 9th percentile and the 91st percentile the 2nd percentile and the 98th percentile.
Any data not included between the whiskers should be plotted as an outlier with a dot, small circle, or star, but occasionally this is not done.
Some box plots include an additional character to represent the mean of the data.
On some box plots a crosshatch is placed on each whisker, before the end of the whisker.
Rarely, box plots can be presented with no whiskers at all.
Because of this variability, it is appropriate to describe the convention being used for the whiskers and outliers in the caption for the plot.
The unusual percentiles 2%, 9%, 91%, 98% are sometimes used for whisker cross-hatches and whisker ends to show the seven-number summary. If the data are normally distributed, the locations of the seven marks on the box plot will be equally spaced.
The fundamental type of the case plot, utilizing a container to pass on the interquartile go, was presented by Mary Eleanor Spear in 1952 and again in 1969.
Since the mathematician John W. Tukey advanced this sort of visual information show in 1969, a few minor departure from the customary box plot have been portrayed. Two of the most widely recognized are variable width box plots and scored box plots (see Figure 4).
Variable width box plots delineate the size of each gathering whose information is being plotted by making the width of the case relative to the size of the gathering. A mainstream show is to make the container width relative to the square foundation of the size of the group.
Scored box plots apply an “indent” or narrowing of the case around the middle. Scores are helpful in offering an unpleasant manual for noteworthiness of distinction of medians; if the indents of two boxes don’t cover, this offers proof of a measurably critical contrast between the medians. The width of the indents is corresponding to the interquartile extend (IQR) of the example and conversely relative to the square foundation of the size of the example. Be that as it may, there is vulnerability about the most proper multiplier (as this may change contingent upon the similitude of the differences of the examples
One convention is to use
The container plot permits snappy graphical assessment of at least one informational indexes. Box plots may appear to be more crude than a histogram or bit thickness gauge however they do have a few points of interest. They occupy less room and are along these lines especially valuable for looking at disseminations between a few gatherings or sets of information (see Figure 1 for a model). Decision of number and width of receptacles systems can intensely impact the presence of a histogram, and decision of transfer speed can vigorously impact the presence of a piece thickness gauge.
As taking a gander at a measurable dispersion is more typical than taking a gander at a container plot, contrasting the case plot against the likelihood thickness work (hypothetical histogram) for an ordinary N(0,σ2) circulation might be a valuable instrument for understanding the case plot (Figure 5).