Statistics For Data Science Course
7
415

Understanding Quartiles, IQR, and Boxplots

What are Quartiles and IQR?

Quartiles in statistics are points that divide the given data-set into 4 equal parts.

While variance and standard deviation are the measurements of dispersion around the mean, quartiles measure the variability around the median. The advantage of quartiles is that it takes every data point into account. Let’s understand this by an example. In the following diagram, a data-set is given and whose quartiles are marked as Q1, Q2, and Q3. 

quartiles_ex

The first quartile Q1 (also called the lower quartile) is the number below which the bottom-most 25% of the data lie. The second quartile Q2 (also called the median) divides the data into two equal parts and has 50% of the data below it. The third quartile Q3 ( also called the upper quartile) has 75% of the data below it and topmost 25% data above it.

Interquartile Range(IQR) – Inter Quartile Range is the range of values where 50% of the data points lie. Technically it is difference between 3rd and 1st quartile.

How to find quartiles?

There are few simple steps to find quartiles. Let’s take an example. We are given with the following data.

  • 2, 5, 6, 7, 10, 17, 13, 14, 16, 20, 18, 12

Before starting, it is needed to sort the data in ascending order.

  • 2, 5, 6, 7, 10, 12 13, 14, 16, 17, 18, 20
Once the given data-set is sorted, quartiles can be found out using the steps shown as in the following picture.
steps_quartiles
Steps to find quartiles

In the first step, the data are divided into two equal parts, that is the median (which is same as Q2) is calculated. In this way, we get two halves of data, which are further divided into two equal parts. This means that the median of each half is calculated. The median of the first half become Q1 and the median of the second half become Q3. In this example,

  • 1st quartile =6.5
  • 2nd quartile or median = 12.5
  • 3rd quartile =16.5
  • IQR = Q3-Q1 = 16.5-6.5 = 10
[amazon_link asins='B01955MG16,B01955M8AA' template='ProductCarousel' tag='my_affiliate_id' marketplace='IN' link-id='ad21e06e-79a2-11e6-b2c5- 37518c0775eb']

Visualization of quartiles and boxplots

Boxplot is a visual representation of quartiles along with the minimum and maximum value of the datapoints. It also represents outliers present in the dataset. Boxplot is also known as Box and Whisker Plot. Mathematician John Tukey first introduced the “Box and Whisker Plot” in 1969 as a visual diagram of the “Five Number Summary” of any given data set. Those five numbers are listed below.
  1. Minimum
  2. First Quartile
  3. Median (Second Quartile)
  4. Third Quartile
  5. Maximum

In last section, we took an example, let’s see the spread of the data using a boxplot which includes five-point summary as well. Please note that 50% of the data-points (6 points in this case) lie within IQR.

boxplot_example

Boxplot and Outliers

What is an Outlier?

Outliers are data points that are unusual and usually does not tell us about most of the data or the spread.
For e.g. Let’s 25,60,65,66,68,70,71,99 are the test scores of students in a class. Clearly 25 is extremely low score and 99 is extremely high score compared to others. Hence, these two scores are outliers.

How to find outlier in a box plot?

A common practice is to set a “limit” that is 1.5 times the width of the IQR. Anything outside that limit from third quartile is an outlier. In box and whiskers plot, outliers are shown by points below or after the minimum and maximum value limits.
Let’s change our previous example to include an outlier and see its boxplot
  • 2, 5, 6, 7, 10, 12 13, 14, 16, 17, 18, 33
The box plot for this distribution is shown below. Please note that apart from showing the 5 point summary(min, q1, q2,q3, max), the box plot also show the outliers ( 33 in this case).

outlier_boxplot

How is sex-ratio of Indian states distributed?

In each of the statistics basics post, “sex ratio of Indian states” example is being included. In below picture, the box plot of sex ratio of Indian states/UTs are plotted. Just by looking at the boxplot, you can extract following information.
  • minimum value of sex ratio among Indian States/UTs is 820
  • first quartile is 900
  • second quartile (median) is 946
  • third quartile is around 975
  • 50% of the states have sex ratio between 900 and 975( that is IQR is 900-975)
  • maximum value of sex ratio is 1084
  • There are two outliers
sex ratio of indian states

You may refer to the following video as a supplement to this post. It explains these concepts briefly.

Show Comments

No Responses Yet

Leave a Reply