639

# Maths Behind ML – Logistic Regression

## The classification problem and the logistic regression

Recall that the classification is a technique to identify the category of a new observation based on a classifier. The classifier is trained with the data where the class of the observations is already known. Let’s take a classification problem. Through this problem, we will try to understand the rest of the concepts explained in this article.
The problem – We want to classify whether or not a person will default on his/her credit card payment based on his/her credit card balance. We are using the data from a simulated dataset default, and for understanding, we assume that the credit card default (YES/NO) depends on the credit card balance only.

## From the problem to a math problem

What is the probability that a person will default on his/her credit card payment when his/her credit card balance is known? Mathematically we want to know,
$P\left(\frac{default=yes}{balance}\right)$ – (1)
For simplicity, let’s denote this conditional probability by p(balance). This probability value will range from 0 to 1. Then for any given value of balance, we can calculate the value for default. If we choose the threshold value of p(balance)=0.5, then we can predict that the person will default on his/her credit card payment if p(balance)>0.5

## Conditional probability as a logistic model

Let’s denote the probability of default as p(X), where X is credit card balance, and default=yes as Y=1, and default=no as Y= 0. Now, we need to make a relationship between p(X) and X. How is the probability of default depends on the credit card balance? We know that as probability values should lie between 0 and 1, any such relationship should also satisfy this requirement. In logistic regression, we use a logistic function to define the relationship between the probability and the predictor. The logistic function approximates a sigmoid and is given below,
$p\left(X\right)=p\left(\frac{Y=1}{X}\right)=\frac{{e}^{{\beta}_{0}+{\beta}_{1}X}}{1+{e}^{{\beta}_{0}+{\beta}_{1}X}}$ -(2)

The following figure depicts the shape of a typical logistic function. Note that it is an S shape curve where the output lies between 0 to 1 and is dependent on input (X).

If we rearrange the logistic function, we get

$\frac{p\left(X\right)}{1-p\left(X\right)}={e}^{{\beta}_{0}+{\beta}_{1}X}$

The quantity $\frac{p\left(X\right)}{1-p\left(X\right)}$ is called the odds and can take any value from 0 to infinity. Values close to 0 and infinity shows that there are very low and very high chances of default, respectively. If we take log on both the sides of the above equation, we find that

$\mathrm{log}\left[\frac{p\left(X\right)}{1-P\left(X\right)}\right]=\beta o+{\beta}_{1}X$

The quantity on the left-hand side is called the log-odds or logit and is linearly dependent on X.

## Estimation of the logistic regression coefficients and maximum likelihood

In the previous section, we saw that the probability of the default, modeled as a logistic function is given by
$p\left(X\right)= \frac{{e}^{{\beta}_{0}+{\beta}_{1}X}}{1+{e}^{{\beta}_{0}+{\beta}_{1}X}}$

In this equation, while X is known( credit card balance in our problem), there are two unknown parameters $\beta_{0}$and $\beta_{1}$. $\beta_{0}$ and $\beta_{1}$ are known as logistic regression coefficient. These coefficients need to be estimated based on the available training data.

Maximum Likelihood Estimation is used to estimate the coefficients. Motivation is to find the values of $\beta_{0}$and $\beta_{1}$ so that the estimated probability using the above equation corresponds to the actual default status of the individual as closely as possible. In other words, we choose $\beta_{0}$and $\beta_{1}$ so that for all individuals who defaulted gives a probability close to 1 and a probability close to zero for individuals who did not. This intuition can be represented using a likelihood function given as

$l\left(\beta_{0},\beta_{1}\right)=\Pi p\left({x}_{i}\right)\Pi \left(1-p\left({x}_{i}\right)\right)$
The estimates of the coefficients are chosen to maximize this likelihood function. The modeling and determination of regression coefficients are possible in software like R and Python. For the default dataset, the coefficient as determined using Python is shown in the following picture.

As the p-value for both the coefficients is very low, hence they both are significant. Using this table, we can conclude that there is a relation between the credit card default and the credit card balance and is given below

$\hat{p}\left(x\right)=\frac{{e}^{-10.6513+0.0055x}}{1+{e}^{-10.6513+0.0055x}}$

## Making predictions of the class

In the last section, we modeled the probability of default when the balance is given as
$\hat{p}\left(X\right)=\frac{{e}^{-10.6513+0.0055X}}{1+{e}^{-10.6513+0.0055X}}$
From this model, we can predict that an individual will default if p(X) >0.5 and will not default if p(X) <0.5.
Let’s predict the default of the individual whose credit card balance is $1000. $\hat{p}\left(X\right)=\frac{{e}^{-10.6513+0.0055*1000}}{1+{e}^{-10.6513+0.0055*1000}}$ =0.00576 Since, the probability is 0.576% for default, we will predict that this individual will not default. While an individual with a$ 2000 balance would have $\hat{p}\left(X\right)$ =0.586, and we would conclude that he would default as our chosen decision boundary is 0.5.

## Conclusion

In this example, we had assumed that the default only depends on a credit card balance for the sake of understanding. We can extend this model with more than one predictor, where the logistic function would be given as follows.

$p\left(X\right)=p\left(\frac{Y=1}{X}\right)=\frac{{e}^{{\beta}_{0}+{\beta}_{1}X1+{\beta}_{2}X2+…}}{1+{e}^{{\beta}_{0}+{\beta}_{1}X1+{\beta}_{2}X2+…}}$
While this example showed a binary classification, logistic regression is capable of classification with more than 2 classes also which is known as Multinomial logistic regression.

Reference-  James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.