Processing math: 100%

Tuesday, 4 September 2018

Math of Intelligence : Logistic Regression

Logistic Regression

Logistic Regression

Some javascript to enable auto numbering of mathematical equations. Reference

In [20]:
%%javascript
MathJax.Hub.Config({
    TeX: { equationNumbers: { autoNumber: "AMS" } }
});

MathJax.Hub.Queue(
  ["resetEquationNumbers", MathJax.InputJax.TeX],
  ["PreProcess", MathJax.Hub],
  ["Reprocess", MathJax.Hub]
);

Here, we will be figuring out the math for a binary logistic classifier.

Logistic Regression is similar to Linear Regression but instead of a real valued output y, it will be either 0 or 1 since we need to classify into one of 2 categories.

In the linear regression post, we have defined our hypothesis function as:

hθ(x)=θ0+θ1x

Now, we can also have multiple input features i.e x1,x2,x3... and so on, so in that case our hypothesis function becomes:

hθ(x)=θ0x0+θ1x1+θ2x2+θ1x3....

We have added x0=1 with θ0 for simplification. Now, the hypothesis function can be expressed as a combination of just 2 vectors: X=[x0,x1,x2,x3,...] and θ=[θ0,θ1,θ2,...]

hθ(x)=θTX

Still, the output of this function will be a real value, so we'll apply an activation function to convert the output to 0 or 1. We'll use the sigmoid function g(z) for this purpose. TODO: Explore other activation functions

g(z)=11+ez

h(X)=g(θTX)=11+eθTX

The most commonly used loss function for logistic regression is log-loss (or cross-entropy) TODO: Why log-loss? Explore other loss functions.

So, the loss function l(θ) for m training examples is:

l(θ)=1m(mi=1y(i)log(h(x(i))+(1y(i))log(1h(x(i)))

which can also be represented as:

l(θ)=(mi=1y(i)log(g(θTx(i)))+(1y(i))log(1g(θTx(i)))

Now, similar to linear regression, we need to find out the value of θ that minimizes the loss. We can again use gradient descent for that. TODO: Explore other methods to minimize the loss function.

θj=θjαθjl(θ)

where α is the learning rate.

From (8), we get that we need to find out θjl(θ) to derive the gradient descent rule. Lets start by working with just one training example.

θjl(θ) can be broken down as follows:

θl(θ)=h(x)l(θ).θh(x)

θl(θ)=g(θTx)l(θ).θg(θTx)

Calculating θg(θTx) first:

θg(θTx)=θ(11+eθTx)

=θ(1+eθTx)1

Using the chain rule of derivatives,

=(1+eθTx)2.(eθTx).(x)

=eθTx(1+eθTx)2.(x)

=1+eθTx1(1+eθTx)2.(x)

=(1+eθTx(1+eθTx)21(1+eθTx)2).(x)

=(1(1+eθTx)1(1+eθTx)2).(x)

=(g(θTx)g(θTx)2).(x)

θg(θTx)=g(θTx)(1g(θTx).x

Now, calculating g(θTx)l(θ),

g(θTx)l(θ)=g(θTx).((y.log(g(θTx)+(1y)log(1g(θTx)))

Again, using the chain rule,

=(yg(θTx)+1y1g(θTx).(1))

=(yy.g(θTx)g(θTx)+y.g(θTx)g(θTx).(1g(θTx))

=(yg(θTx)g(θTx).(1g(θTx))

g(θTx)l(θ)=(yg(θTx)g(θTx).(1g(θTx))

Finally, combining (10),(11),(12), we get

θl(θ)=(yg(θTx)g(θTx).(1g(θTx)).g(θTx)(1g(θTx).x

θl(θ)=(yg(θTx)).x

θl(θ)=(yh(x)).x

Plugging this back in (8),

θj=θj+α(yh(x)).x

Math of Intelligence : Logistic Regression

Logistic Regression Logistic Regression ¶ Some javascript to enable auto numberi...