Implement gradient descent in python

6 minute read

Gradient Descent

Gradient Descent is the most common optimization algorithm in machine learning and deep learning.

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. This means it only takes into account the first derivative when performing the updates on the parameters. The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent.

Implementing the Gradient Descent Algorithm

In this post, we’ll implement the basic functions of the Gradient Descent algorithm to find the boundary in a small dataset.

Reading and plotting the data

Assume that we have two features and one output variable with values 1 and 0. Our goal is to find the best line that divides these points. We will use gradient descent to find the weights of the line that best separates these points.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data = pd.read_csv('data.csv', header=None)
data.head()
0 1 2
0 0.78051 -0.063669 1
1 0.28774 0.291390 1
2 0.40714 0.178780 1
3 0.29230 0.421700 1
4 0.50922 0.352560 1

First, we’ll start with some functions that will help us plot and visualize the data.

# Some helper functions for plotting and drawing lines
def plot_points(X, y):
    # np.argwhere returns the indices where y==1
    admitted = X[np.argwhere(y==1)]
    rejected = X[np.argwhere(y==0)]
    # x and y co-ordinates of rejected and admitted is plotted in different colors
    plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k')
    plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'green', edgecolor = 'k')

def display(m, b, color='g--'):
    plt.xlim(-0.05,1.05)
    plt.ylim(-0.05,1.05)
    x = np.arange(-10, 10, 0.1)
    plt.plot(x, m*x+b, color)

X = np.array(data[[0,1]])
y = np.array(data[2])
plot_points(X,y)
plt.show()

png

# Activation (sigmoid) function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def output_formula(features, weights, bias):
    return sigmoid(np.dot(features, weights) + bias)

def error_formula(y, output):
    return - y*np.log(output) - (1 - y) * np.log(1-output)

def update_weights(x, y, weights, bias, learnrate):
    output = output_formula(x, weights, bias)
    d_error = y - output
    weights += learnrate * d_error * x
    bias += learnrate * d_error
    return weights, bias

Training function

This function will help us iterate the gradient descent algorithm through all the data, for a number of epochs.

An epoch is one iteration over the dataset.

It will also plot the data, and some of the boundary lines obtained as we run the algorithm.

np.random.seed(44)

epochs = 100
learnrate = 0.01

def train(features, targets, epochs, learnrate, graph_lines=False):

    errors = []
    n_records, n_features = features.shape
    last_loss = None

    # start with some random weights and bias of 0.
    # For ex weights will be: array([ 0.58383321, -0.85546737])
    weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
    bias = 0

    for e in range(epochs):

        # iterate over the entire dataset
        for x, y in zip(features, targets):
            weights, bias = update_weights(x, y, weights, bias, learnrate)

        # Printing out the log-loss error on the training set
        out = output_formula(features, weights, bias)
        loss = np.mean(error_formula(targets, out))
        errors.append(loss)
        if e % (epochs / 10) == 0:
            print("\n========== Epoch", e,"==========")
            if last_loss and last_loss < loss:
                print("Train loss: ", loss, "  WARNING - Loss Increasing")
            else:
                print("Train loss: ", loss)
            last_loss = loss

            # Converting the output (float) to boolean as it is a binary classification
            # e.g. 0.95 --> True (= 1), 0.31 --> False (= 0)
            predictions = out > 0.5

            accuracy = np.mean(predictions == targets)
            print("Accuracy: ", accuracy)
        if graph_lines and e % (epochs / 100) == 0:
            display(-weights[0]/weights[1], -bias/weights[1])


    # Plotting the solution boundary
    plt.title("Solution boundary")
    display(-weights[0]/weights[1], -bias/weights[1], 'black')

    # Plotting the data
    plot_points(features, targets)
    plt.show()

    # Plotting the error
    plt.title("Error Plot")
    plt.xlabel('Number of epochs')
    plt.ylabel('Error')
    plt.plot(errors)
    plt.show()

Time to train the algorithm!

When we run the function, we’ll obtain the following:

  • 10 updates with the current training loss and accuracy
  • A plot of the data and some of the boundary lines obtained. The final one is in black. Notice how the lines get closer and closer to the best fit, as we go through more epochs.
  • A plot of the error function. Notice how it decreases as we go through more epochs.
train(X, y, epochs, learnrate, True)
========== Epoch 0 ==========
Train loss:  0.7135845195381634
Accuracy:  0.4

========== Epoch 10 ==========
Train loss:  0.6225835210454962
Accuracy:  0.59

========== Epoch 20 ==========
Train loss:  0.5548744083669508
Accuracy:  0.74

========== Epoch 30 ==========
Train loss:  0.501606141872473
Accuracy:  0.84

========== Epoch 40 ==========
Train loss:  0.4593334641861401
Accuracy:  0.86

========== Epoch 50 ==========
Train loss:  0.42525543433469976
Accuracy:  0.93

========== Epoch 60 ==========
Train loss:  0.3973461571671399
Accuracy:  0.93

========== Epoch 70 ==========
Train loss:  0.3741469765239074
Accuracy:  0.93

========== Epoch 80 ==========
Train loss:  0.35459973368161973
Accuracy:  0.94

========== Epoch 90 ==========
Train loss:  0.3379273658879921
Accuracy:  0.94

png

png

Effect of learning rate

A learning rate that is too low will lead to slow training and a higher learning rate will lead to overshooting of slope. A good learning rate has a steady drop in loss as shown in red.

Learning rate == 0.001

Observe how the training loss is very slowly decreasing.

train(X, y, epochs, 0.001, True)
========== Epoch 0 ==========
Train loss:  0.6495239376128851
Accuracy:  0.5

========== Epoch 10 ==========
Train loss:  0.6391563617203656
Accuracy:  0.52

========== Epoch 20 ==========
Train loss:  0.6301091879292645
Accuracy:  0.55

========== Epoch 30 ==========
Train loss:  0.6218071298873118
Accuracy:  0.6

========== Epoch 40 ==========
Train loss:  0.6139683824498773
Accuracy:  0.65

========== Epoch 50 ==========
Train loss:  0.6064535530882055
Accuracy:  0.7

========== Epoch 60 ==========
Train loss:  0.5991919024617941
Accuracy:  0.72

========== Epoch 70 ==========
Train loss:  0.5921458373969366
Accuracy:  0.74

========== Epoch 80 ==========
Train loss:  0.5852939054750084
Accuracy:  0.76

========== Epoch 90 ==========
Train loss:  0.5786226410856905
Accuracy:  0.78

png

png

Learning rate == 0.0001

As you can see the training loss is very slowly decreasing. This means, we need a lot more epochs.

train(X, y, epochs, 0.0001, True)
========== Epoch 0 ==========
Train loss:  0.6703381397426891
Accuracy:  0.56

========== Epoch 10 ==========
Train loss:  0.6694428706994785
Accuracy:  0.56

========== Epoch 20 ==========
Train loss:  0.6685502003171803
Accuracy:  0.56

========== Epoch 30 ==========
Train loss:  0.6676600994818186
Accuracy:  0.57

========== Epoch 40 ==========
Train loss:  0.6667725408444339
Accuracy:  0.57

========== Epoch 50 ==========
Train loss:  0.6658874986883481
Accuracy:  0.57

========== Epoch 60 ==========
Train loss:  0.6650049488064382
Accuracy:  0.57

========== Epoch 70 ==========
Train loss:  0.6641248683876563
Accuracy:  0.58

========== Epoch 80 ==========
Train loss:  0.6632472359121028
Accuracy:  0.58

========== Epoch 90 ==========
Train loss:  0.6623720310540064
Accuracy:  0.58

png

png

Learning rate == 1

It is unusual to give a learning rate 1 or above. As you can see the training loss drops sharply in this case.

train(X, y, epochs, 1, False)
========== Epoch 0 ==========
Train loss:  1.5832613137744482
Accuracy:  0.5

========== Epoch 10 ==========
Train loss:  0.48954703287518425
Accuracy:  0.75

========== Epoch 20 ==========
Train loss:  0.35356664325017617
Accuracy:  0.83

========== Epoch 30 ==========
Train loss:  0.30926691716477195
Accuracy:  0.9

========== Epoch 40 ==========
Train loss:  0.2883437035236865
Accuracy:  0.91

========== Epoch 50 ==========
Train loss:  0.27646553775922106
Accuracy:  0.91

========== Epoch 60 ==========
Train loss:  0.26895278307881415
Accuracy:  0.92

========== Epoch 70 ==========
Train loss:  0.2638508772751001
Accuracy:  0.93

========== Epoch 80 ==========
Train loss:  0.26020737267954447
Accuracy:  0.93

========== Epoch 90 ==========
Train loss:  0.25750654572153553
Accuracy:  0.93

png

png

What happens when we want a non-linear boundary?

Until now, our goal was to find a straight line that separate the points. But what if we want to find a non-linear boundary between these points? We will see that in the next post.