Lean Manufacturing and Six Sigma Definitions

Glossary terms, history, people and definitions about Lean and Six Sigma

Box Plot

Box Plots, also called Boxplots or Box and Whisker Plots, are graphical diagrams to display a summary of data. They were first introduced in 1969 by John Tukey.

Box plots are nonparametric, in that you don’t need to make any assumptions or have any knowledge of the underlying statistical distribution of the data.

The box can be confusing to users who see it for the first time, but once comprehended, it is a popular graph due to its simplicity and ability to convey a lot of information in a small space. They are useful for comparing distributions between several groups or sets of data, as we will show in the example at the bottom of the page. Box plots can be drawn either horizontally or vertically.


Search Lean Six Sigma Books on Amazon >>>
Disclaimer: As an Amazon Associate, we earn commission from qualifying purchases.

The top of the box represents the 75th percentile (Q3). This is determined by finding the data point halfway between the middle point (median) and the largest point (maximum). The line across the middle of the box is the median, which is the 50th percentile. The bottom of the box represents the 25th percentile (Q1). This is determined by finding the data point halfway between the middle point (median) and the smallest point (minimum).

The interquartile range (IQR) is defined as (Q3 – Q1) * 1.5, which is 50% greater than the width of the box.

The upper whisker is defined as the distance from the 75th percentile (Q3) + IQR, and the lower whisker is defined as the distance from the 25th percentile (Q1) – IQR. Any data points outside of the upper and lower whiskers get identified as an asterisk.

To understand the box and the lines (whiskers) in more detail with an example, check out this article “What do all the lines and boxes mean on a boxplot?

According to Wikipedia, there are multiple approaches to calculate or determine the ends of the whiskers. The suggested article above showed you the 2nd option (in bold). The first graph on this page used a different method, which is why only one asterick (outlier) is present.

Let’s take a look at an example, where box plots can quickly convey information. If you wanted to compare the number of hours worked each day, you could create a box plot for each worker on the same graph.

For the three workers, you can quickly conclude the following:

  1. Sara works less hours that Seth and Mary, because her blue box is lower than the other two. Her median line is lower, and the entire blue box is lower (which is the middle 50% of her data).
  2. Seth is very consistent around 8 hours a day, since his blue box is the smallest height (more data fits into a smaller range of values, so he has less variation above or below 8 hours).
  3. Seth has more outliers in his data, represented by the asterisks. This is because he is more consistent around 8 hours, so anytime he varies more than an hour above or below 8 hours, it is flagged as an outlier (unusual), given his past history of consistency. For Mary, she also has readings in the 10 or 11 hours a day range, but since her hours vary more around 8 hours, they are not deemed to be outliers as often.

There are other interpretations to make, but these stand out at a quick glance.

You can download this diagram for FREE by clicking on the image, or going to the Boxplot Reference Guide page.