Statistics For Data Science Course

All About Normal Distribution

What is a Normal Distribution?

A Normal Distribution is a kind of continuous probability distribution wherein most of the values cluster in the middle of the range and there are lesser values towards the extreme ends of the range. You can intuitively appreciate this fact by looking at the Normal distribution graph. 

Mathematically a Normal distribution is given as below.

Why Normal Distribution is Important?

Many of the real-life events approximately follow Normal distribution. A few examples are –

  • People’s heights and weights
  • Population’s blood pressure
  • Test Scores
  • Measurement errors
  • Noise in a signal
These data distributions tend to be around a central value with no bias left or right, and it gets close to A Normal Distribution like the graph shown in the previous section.

Properties of  a Normal Distribution

A normal distribution has some unique properties. One should remember those properties as it makes analysis simpler. These properties are listed below.
  • All Normal distributions are symmetrical around mean.
  • The mean, the median, and the mode of the normal distribution are the same.
  • 68.27% of the values in a normal distribution lie within one standard deviation.
  • 95.45% of the values in a normal distribution lie within 2 standard deviations.
  • 99.73% of the values lie within 3 standard deviations from the mean.
  • In continuous distribution, we always talk about the probability of a range of values. The probability of a specific outcome is always zero.

Standard Normal Distribution

Standard Normal distribution is a special case of Normal distribution wherein
  • mean = median= mode =0
  • Standard deviation = 1
  • Area under the Standard Normal Distribution curve is  equal to 1.
All other properties of Standard Normal Distribution remain the same. It is also called a Z distribution. The standard normal distribution is depicted in the following graph.

Standardization of a Normal Distribution

We can transform any normal distribution into a standard normal distribution by using Z scores. The idea behind the standardization is the ease of calculation and interpretation of results. If we determine that a population approximates a normal distribution, then we can make some powerful inferences about it once we know its mean and standard deviation. In other posts on Statistics, we will be using sampling, standard error, and hypothesis testing to evaluate experiments. A large part of this process is understanding how to “standardize” a normal distribution. We can take any normal distribution and standardize it to a Standard Normal Distribution.

What is a Z score?

Z score is the point on standard normal distribution and tells that How far from mean our data point is. Technically it represents how many standard deviations away from mean our data point is.  Z score ranges from -3 to +3. Once the Z score is known, we can easily calculate the percentile score by using a Z table or using excel or using any other statistical programming language. The percentile is calculated as a cumulative distribution function for that particular Z score.  A value from any normal distribution can be transformed into its corresponding value on a standard normal distribution using the following formula :

where Z is the value on the standard normal distribution, X is the value on the original distribution, μ is the mean of the original distribution, and σ is the standard deviation of the original distribution.

If all the values in a distribution are transformed to Z scores, then the distribution will have a mean of 0 and a standard deviation of 1. This process of transforming a distribution to one with a mean of 0 and a standard deviation of 1 is called standardizing the distribution. 

An example of standardization of a Normal Distribution is given in the following picture. In this example, it is given that the mean = 8, standard deviation = 3 and a particular value in normal distribution is x=6. We want to know what percentage of data lie at x<6 (that is percentile). 

Inference of a Z Score

 By thumb rule of a standard normal distribution, it is known that
  • Within one standard deviation, 68.27% values lie. The probability that value lies within Z=-1 to +1 is 68.27%
  • Within two standard deviations, 95.45% values lie. The probability that value lies within Z=-2 to +2 is 95.45%
  • Within three standard deviations, 99.73% values lie. The probability that value lies within Z=-3 to +3 is 99.73%
Also, from a standalone Z score, we can calculate the percentile as described in the next section. In our example, the percentile corresponding to Z=-0.67 was calculated as 25.14.  It means there are 25.14 % values that are less than -0.67 in a standard normal distribution. It also represents that the 25.14% area in a standard normal distribution is covered by values less than -0.67. It also represents that there is 25.14% probability that a data point would have a value less than -0.67.

Percentile/Area under the Z distribution/CDF Calculation

Using Excel

  • Calculate Z score using the formula Z=(x-μ)/σ ( in last example Z=-0.67)
  • Once the Z score is determined, percentile or area under standard normal distribution can be calculated as below. 
where B3 is the cell where Z score is present and TRUE means that we are calculating cumulative distribution function. It means we are interested in the sum of all areas which lies below the Z score. This also gives the probability of getting the a value that is less than that of the Z score in a Normal Distribution.

Using Python

Calculate Z score using the formula Z=(x-μ)/σ ( in last example Z=-0.67) and then calculate the percentile using the following code.

This video explains Normal Distribution concepts briefly.

Show Comments

No Responses Yet

Leave a Reply