Statistics For Data Science Course
10
1909

All About Hypothesis Testing

What is a Hypothesis?

A hypothesis is a theory or a guess which may or may not be true. A hypothesis is something more than a wild guess but less than a well-established theory. Let’s try to understand Hypothesis through a question.

Do people who play video games have a quicker response time?

We can convert this question into a hypothesis that is, into a statement that may or may not be true.
Like,
Ho- People who play video games do not have a quicker response time.

or

H1 – People who play video games do have a quicker response time.

What is Hypothesis Testing?

Hypothesis testing is a way to validate or reject a hypothesis using statistical methods. The first thing that is tested for validation is that “There is no relationship or no effect” and call it the Null Hypothesis. Let’s convert the question in the previous section into a Null hypothesis. The question was –
Do people who play video games have a quicker response time?
and the corresponding null hypothesis is –

Ho- People who play video games do not have a quicker response time.

The null hypothesis states there is no relationship between the measured phenomenon (the dependent variable) and the independent variable.

We can either reject or fail to reject a Null Hypothesis. If the Null Hypothesis is rejected, it implies that there is a relationship between observed events. If we fail to reject a Null Hypothesis, it implies that any observed difference in phenomena or populations would be just by chance. Null Hypothesis is denoted by Ho and alternate hypothesis is denoted by H1. If the null hypothesis is rejected, then we can look for an alternative hypothesis.

How to frame a Null Hypothesis?

A question well understood is half solved. It is very essential to frame a null hypothesis correctly. In order to frame a Null Hypothesis, reframe the question in such a way that assumes no relationship between the variables. A few examples are mentioned in the following table.

EX_NULL_HYPOTHESIS

Framing Null Hypothesis

Defining a problem is as important as understanding a problem. Now let’s look at how can we define a hypothesis given we have a problem which tries to answer whether any relationship exists between cause and effect.

The Study and The Hypothesis

A neurologist is testing the effect of a drug on response time by injecting 100 rats with a unit dose of a drug and records the response time. He takes 100 such samples. It is given that,

  • Mean response time on rats NOT injected with drug(μ) = 1.2 seconds
  • Mean response time of rats injected with drug(x̄)=1.05 seconds
  • Standard deviation of sample (s) = 0.05 seconds
  • Number of samples (N)= 100

In this study, the neurologist wants to answer the following question:-
Does the drug affect the response time? The neurologist defines a Null Hypothesis as,

H0: μ = 1.2 (even with drug)

that is, he assumes that his drug does not improve the response time. He also defines an alternate hypothesis as below:-

H1: μ ≠ 1.2
that is his drug either improves or deteriorates the response time.

Testing a hypothesis

hypothesis-testing

The first step in testing a hypothesis is to assume that the NULL Hypothesis is true. That is, in our example, the drug has no effect on response time and that the mean response time remains 1.2 seconds. If the number of samples is sufficiently large( say >30), we can assume that

  • sampling distribution of the response times are normally distributed around it’s mean( as per Central Limit Theorem)
  • the population standard deviation is almost the same as the sample standard deviation

Now, When we have already assumed that the Null Hypothesis is true, what is the probability that the mean response time is 1.05 seconds( The drug indeed improves the response time)? That is, assuming that the Null Hypothesis is true, what is the probability of the observed mean (1.05 seconds)

𝑃(𝑑𝑎𝑡𝑎/𝑁𝑢𝑙𝑙)=?

In other words, how extreme is our observation when it is assumed that the Null is true?

If our data lies under top 5% extreme values, we can say that given that the null is true, our data is highly unlikely and we REJECT the null hypothesis.

In this case,
𝑍=(𝑥 ̅−𝜇)/(𝜎⁄√𝑛)=(1.05−1⋅2)/((0⋅5)/√100)=−0.15/0.05=−3
P value = 0.0027
Let’s choose alpha( explained later in this article) to be 0.05.

  • if p < alpha, we can reject the null hypothesis and assume alternate to be true
  • if p >= alpha, we would FAIL to reject the Null Hypothesis.

So in our example as per the Hypothesis Testing, the Drug does affect the response time.

What is P-value?

The P-value is the probability of getting the observed value given the null hypothesis is true.

Mathematically we can represent it as,
𝑃(𝑠𝑎𝑚𝑝𝑙𝑒_𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐/𝑁𝑢𝑙𝑙)

The smaller the p-value, the greater is the evidence against the null hypothesis
.

In the last example, we assumed that there is no effect of the drug on the response time. Then we studied the sample and calculated the p-value = 0.0027. The lower p-value tells that when our hypothesis is true, there is only a 2.7% chance of getting the observed value( mean of 1.05 seconds). This makes our null hypothesis looks ridiculous. If 𝛼 = 0.05 and the alternative hypothesis is less than the null, then the left-tail of our probability curve has an area of 0.05

If 𝛼 = 0.05 and the alternative hypothesis is more than the null, then the right-tail of our probability curve has an area of 0.05

If 𝛼 = 0.05 and the alternative hypothesis is not equal to the null, then the two tails of our probability curve share an area of 0.05 assuming the level of significance to be alpha= 0.05, as p< alpha, we reject the null hypothesis and assume the alternate hypothesis to be true.

What is level of significance alpha?

The level of significance 𝛼 is the area inside the tail(s)of our null hypothesis. To reject the null hypothesis, how extreme should our observation lie? In other words, how small should be the p-value in order to reject the null hypothesis. As a thumb rule usually  is taken as 0.05 or 2 standard deviations away from the distribution mean. But  can be taken differently based on different problems at hand. If the observed p-value is lower than alpha, then we conclude that the result is statistically significant.

  • If p< 𝛼, reject the H0, ⇒accept Ha
  • If p > 𝛼, fail to reject H0

The Tails in Hypothesis Testing

If 𝛼 = 0.05 and the alternative hypothesis is less than the null, then the left-tail of our probability curve has an area of 0.05.

If 𝛼 = 0.05 and the alternative hypothesis is more than the null, then the right-tail of our probability curve has an area of 0.05

If 𝛼 = 0.05 and the alternative hypothesis is not equal to the null, then the two tails of our probability curve share an area of 0.05

We can summarize the whole Hypothesis testing process using the following flowchart diagram.
Flowchart-Hypothesis

 

The following video explains Hypothesis Testing in detail.

https://youtu.be/SCiC2-E5LWQ

Show Comments

No Responses Yet

Leave a Reply