Statistics For Data Science Course
5
522

All About Range, Variance, and Standard Deviation

What is Dispersion?

In statistics, dispersion is the extent to which the given data distribution is squeezed or stretched out. 

It is a measure of spread of a given data distribution. To understand what dispersion means, let’s take an example. We are given with two data-sets as below

  • D1- 7,8,9,12 (mean =9)
  • D2- 1,6,10,17(mean =9)
In both the data-sets D1 and D2 the measurement of central tendency that is mean is the same. But, that does not explain the difference in the spread or the shape or the dispersion of both the data-sets. Is D1 more disperse than D2? To answer this question, one needs to have some mechanism that measures the spread of the data, which is called measurements of dispersion. Let’s take another real example. We are given the data-set of the sex ratio of the Indian state. While the mean sex ratio in India is 940. It does not tell how far is the sex ratio of Kerala or Daman and Diu as compared to the mean. Measurement of dispersion is the answer to such questions. The following diagram presents the problem of measurement of the dispersion in the “Sex Ratio” data-set.


Mean_SexRatio
There are several ways in which one can measure dispersion in statistics. They are 
  • Range
  • Variance
  • Standard Deviation
  • Quartiles and InetrQuartile Range

In this post I Will discuss first three and Quartiles and IQR will be taken in another post

Range

Range measures the difference between largest and smallest data-point in a data-set. In our sex-ratio example, lowest sex-ratio in India is 618(Daman and Diu) and the highest sex-ratio is 1084(Kerala). Hence, here the range becomes
  • range = (1084-618) = 466
The limitation of the range is that it does not measure the spread of most of the values in a data-set and hence does not convey much information.

Variance

Variance is defined as the average of the squared differences from the mean.

 Variance is measured by calculating the squared difference of every individual data-point from the mean of distribution and then taking average of sum of all such differences.  The Mathematical formula for variance is give below. If the data-set is from sample, then in denominator we use N-1 instead of N which is called Bessel correction.
Variance

Let’s explain variance by the example we took in last section.

  • D1- 7,8,9,12 (mean =9)
  • D2- 1,6,10,17(mean =9)

By using the population variance formula as given above, it is found that

  • Var(D1)=3.5
  • Var(D2)=34.25
So, as Var(D2) > Var(D1), it can be concluded that the D2 data-set is more spread out than D1. While we never calculate variance by hand, it is good to know the method to have a better intuitive understanding. In the following diagram , one more example of variance calculation is given.

variance-calculation

Standard Deviation

The problem with variance is that it does not have the same unit as that of or population. If the population has length data which is in meters then the variance would show results in meter squared. Standard deviation solves this problem and is defined as 

The squared root of variance is called standard deviation

Mathematically it is given as,

standard-deviation

It is always meaningful to talk about the values that lie within one standard deviation. In the sex-ratio data-set example, the standard deviation is calculated as 79.74. In the following diagram, the blue line denotes the mean value and the brown line represents the boundary of one standard deviation. you may observe here that most of the data-points lie within one standard deviation.

Calculation of Variance and Standard Deviation in Excel

Variance can be calculated in excel using following formula.

  • Sample Variance - VAR.S(Array of Data)

  • Population Variance - VAR.P(Array of Data)

Standard deviation can be calculated in excel using following formula.
  • Sample Standard Deviation - STDEV.S(Array of Data)
  • Population Standard Deviation - STDEV.P(Array of Data)

Calculation of Variance and Standard Deviation in Python

Variance and Standard deviation can be calculated using var and std functions in python as below.
  • (StateWise_sex_ratio['Sex_Ratio']).var()
  • (StateWise_sex_ratio['Sex_Ratio']).std()
The following video explains these concepts in detail.

Show Comments

No Responses Yet

Leave a Reply