Implementing Gradient Descent for Linear Regression algorithm

Liger
3 min readOct 22, 2021

--

Photo by Kalle Kortelainen on Unsplash

Recently my cousin asked me to help her with a math problem involving trigonometry (to prove an equality involving trigonometric functions). Trigonometry was little difficult for me back then and I was not really good at getting the final “LHS=RHS” result at the end of the problem. Hence I was reluctant at first, since i don’t remember much of the common steps used to prove an equality and also with my ego telling me to back away from a possible failure. Nevertheless I gave it a shot and was able to solve it without much effort.

That’s when I realized, I had assumed the level of difficulty based on my experience with such problems when I was at the her age. What I forgot to comprehend was that as we learn and grow, all those difficult problem seems rather simple in hindsight. With that thought, I took a step further and tried implementing the “Hello World” of Machine learning — Linear Regression algorithm from scratch, using gradient descent optimization and applying the concepts on Linear Algebra and Derivatives. Needless to say it was quite gratifying as it was the case for that math problem.

Here I will provide you with a walk through on the implementation of Linear Regression from scratch using Gradient Descent optimization algorithm.

Let’s start with identifying the 3 core pillars of machine learning

Algorithm

Linear Regression Algorithm — Target variable is modeled as a linear function of the independent variable (‘n’ is the number of features)

Linear Regression equation

Cost Function

Ordinary Least Squares —Mean square of error terms as given by the following equation. (‘m’ is the number of samples/observations)

Ordinary Least Square cost function

Optimizer

Gradient Descent — First order iterative optimization algorithm to find the local minimum of a differentiable function. Since the cost function is a convex function local minimum will be same as the global minimum

Gradient Descent optimization equation

Once we substitute the equation for cost function and fin d its derivative, we get the gradient of weights as follows.

Gradient

Since we have established these and obtained the necessary equations, let’s start with implementing it in code.

Below plot shows the predicted line for a Simple Linear Regression problem based on the above implementation. We could also see the line predicted by the sklearn library

--

--

Liger

ML Engineer in making. Have been a part of Data domain for the past 6 years