Founding assumption of Machine Learning algorithms

Liger
4 min readJan 14, 2022

--

Photo by Sigmund on Unsplash

Most rewarding aspect of learning any new concept lies in the Eureka moment (or Aha moment as it is called) when suddenly all the juggled pieces fall into place and reveal the big picture. Its sweeter when the picture was already completed and all you need is a slight change in perspective. I had an ‘Aha!’ moment recently and thought of sharing with you the same

What we already knew
Objective of any supervised machine learning algorithm is to model the relationship between the dependent variable and the independent variables based on the data provided. It does so either by providing a mathematical function or a set of rules that could map the feature set (independent variables) to the target variable. (dependent variable).We could use the model thus generated by the algorithm to predict the target variable given a set of feature inputs.

Revelation
The dataset that is used by the Machine Learning algorithm to generate the model is considered only to be a subset of the actual real world dataset”

Give it some time and let it sink in. Though this looks like a silly and well known statement, it has several implications and forms the base for many other concepts and activities that we do for a machine learning problem.

This assumption means that the data at hand is only a sample (say S) of the whole data, population (P) and what an supervised ML algorithm is trying to do is to model the relationship in the population using the sample data we have.
i.e if the actual function that maps the independent variable (Y) to the dependent variables (X) of the whole real world data is represented as F(x), ML algorithm tries to model the relationship and generates the function f(x)

Actual function which is unknown (based on the whole data)
Y=F(X)
Function generated by the ML algorithm(based on the dataset provided)
Ŷ= f(X)

Thus our objective becomes creating a generalizable model (‘f’) of the population from the sample S and predicting the target variable as Ŷ= f(X) such that Ŷ is closer to Y. (By generalizable, what I mean is those patterns in the sample that can be extended to the whole population)

Since we have established the above assumption let us look at some of the other ML concepts in new light.

Train test split
This is an obvious one. Based on above assumption, ML algorithm tries to learn generalizable patterns from the sample. Thus the model needs to be evaluated on how much those patterns from sample conforms to the population and hence the model should be assessed on a different subset of population different from initial sample.
In this context, rather than using the whole dataset for learning and gathering new data samples from population for evaluation, we keep aside a part of already gathered data (S) to be used as a different subset of population

Bias and Variance
Based on the above assumption, Bias and Variance can be interpreted as below.
Bias Measure of the difference between the predicted value (Ŷ) and the actual value (Y)
VarianceMeasure of how much ‘f ‘ needs to be changed when a different sample (S’) from population (P) is used.

First part of generalizable model indicate a low variance and the second part of Ŷ closer to Y indicate low bias.

In the above statement, bias term is straight forward and is easy to understand. A low bias means the predicted value is closer to the actual value and vice versa. If the difference is higher, model is said to be underfitting, meaning the generated model is not even able to fit the data in sample S correctly.

Coming to variance, a low variance means the model generated by the ML algorithm has captured the generalizable relationship (between target and feature variables) of population rather than those specific to the sample S. Hence if a different dataset from the same population is used by the algorithm, model generated would be more or less similar. In case if factors learnt by the model is specific to the sample S, model is said to be overfitting and any small change in the data (S) will result in drastic change in the model. This is also the reason why in case overfitting, model performs great on training data but poorly on the test data.

Note: Since we are trying to create the model ‘f’ from the sample, S, there will always be some error associated with any machine learning algorithm and that is known as irreducible error

--

--

Liger

ML Engineer in making. Have been a part of Data domain for the past 6 years