A Guide to Logistic Regression
On Sat Jan 28 2023
Introduction
In this post I will take you through how logistic regression works. I will assume that you already have some prior knowledge about machine learning and python. It would also be beneficial to know some math to follow the derivations needed for gradient descent. I will try to explain the math as much as I can because I have some trouble with it myself.
What is Logistic Regression?
Logistic regression is a binary classification method, meaning you can distinguish between two different categories. This might seem counterintuitive to the second part “regression”. The reason it is called regression is because on the contrary to methods like KNN, we have a probability of one of the two labels.
NOTE! You can use this method for classifying features into more than two labels but that is not within the scope of this post. Look up the softmax function and/or multi-class classification (with logistic regression) if you want to research that.
Components of Logistic Regression
One core component of logistic regression is the sigmoid function which we use for predictions. It looks like this:
Where is the features and their corresponding weights
Loss function
We need a loss function to be able to improve the model as we train it. It is called a Binary Cross-Entropy loss function.
We can convert this function to a single line by utilizing and , since y can only be or .
This is the function we will use moving forward.
Gradient Descent
Training the model is all about minimizing the loss function (while avoiding overfitting). We will use gradient descent to do just that.
To be able to use gradient descent we need the derivatives of the loss function and its parts.
By only focusing on the part within the sum function (think of using one sample) of our loss function we can start derivating.
A problem we have is that is a function itself. But thanks to the Chain Rule we can still find the derivatives.
Part One
The derivative of the natural logarithm is this:
By using that rule and the chain rule, we can find the derivative of the extracted part mentioned earlier.
First we divide into two parts
Then the first part can easily be derived
And the second part as well, although slightly more complicated
Finally, we add them together again
Simplify, with the help of some tool like wolfram alpha or whatever you prefer.
Part Two
The next step is to derivate (the sigmoid function). If you want all the steps, you can google or take a look at this article. I skip multiple steps as it is pretty long, but essentially it looks like this:
Part Three
The final part of the puzzle is to find the derivative of . What is important to notice is that we want to find the derivative of in respect to the feature weights. This means that for each weight , the derivative of with respect to is its corresponding feature .
This leads to our final, complete derivative. All we need is to multiply the different parts with one another in accordance with the chain rule and we are done!
Now that we have the derivative, we can implement gradient descent!
Implementing in Python
# Required
import numpy as np
class LogisticRegression:
def sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def cost_function(self, X, y, weights):
# Multiply each x_i with its corresponding weight
z = np.dot(X, weights)
# Calculate the prediction
y_hat = self.sigmoid(z)
# Calculate the loss
predict_1 = y * np.log(y_hat)
predict_0 = (1 - y) * np.log(1 - y_hat)
# Return the average loss
return -np.sum(predict_1 + predict_0) / len(X)
def fit(self, X, y, learning_rate=0.05, epochs=25):
# Initialize the weights to random values and after that a loss array
weights = np.random.rand(X.shape[1])
loss = []
for _ in range(epochs):
# Calculate the prediction
z = np.dot(X, weights)
y_hat = self.sigmoid(z)
# Gradient descent vector (division to normalize)
gradient_vector = np.dot(X.T, (y_hat - y)) / len(X)
# Update the weights
weights -= learning_rate * gradient_vector
# Record loss
loss.append(self.cost_function(X, y, weights))
self.weights = weights
self.loss = loss
def predict(self, X):
z = np.dot(X, self.weights)
return [1 if i > 0.5 else 0 for i in self.sigmoid(z)]