We all are aware of the two common types of Regression, logistics, and linear Regression. Both of these topics are basic Machine Learning concepts. We experience overfitting in a model by increasing the degree of freedom in regression models. We can overcome the overfitting with the help of regularization techniques. The two techniques for reducing overfitting issues can use lasso and ridge regression. Below, we will understand the concept of Lasso Regression. We will go over how it is different and similar to Ridge Regression.

What is Regression?

You can predict the continuous value of a model through the regression model. For instance, you can predict real estate prices depending on the size, location, and features of the house. This is the simplest example of understanding regression. Regression is a supervised technique.

What is Regularization?

The term regularization means making the conditions acceptable or regular. That is why we commonly use this technique in the machine learning process. In machine learning, regularization means shrinking or regularizing the data towards zero value. In easy words, you can use regularization to avoid overfitting by limiting the learning capability or flexibility of a machine learning model.

Types of Regularization

There are two basic types of regularization techniques. These techniques are Ridge Regression and Lasso Regression. Their method of penalizing the coefficient is different. However, both techniques help in reducing overfitting in a model.

Lasso Regression

This technique is a type of linear regression and helps in shrinking the limitation of the model. The data values shrink to the center or mean to avoid overfitting the data. Using the context of Ridge Regression, we will understand this technique in detail below in simple words below.

Understanding the Concept of Lasso Regression

How Ridge and Lasso Regression are Same

Lasso regression is very similar to the concept of Ridge regression. We can understand Lasso regression by considering an example. Suppose we have a bunch of mice. We can start by making a graph of the weight and size of individual mice. On the vertical line of the graph, we take the size, and on the horizontal line, we will take the weight.

Now split this data on the graph into two different sets for better classification. We will highlight the training data as red dots on the graph, and we will highlight the testing data with the green dots. Now, we will use the Least Squares and place a line on the training data. 

In simple words, we can say that we need to minimize the sum of the squared residuals. After we fit the line, we can see that the training data has a low bias. The Least Squares line will not fit the testing data, or we can say that the variance is high.

Now, we can use ridge regression and fit the line on the data. By doing this, we are minimizing the sum of the squared ridge regression and lambda times the slope squared. Ridge regression is the Least-squares plus the Ridge Regression Penalty.

The sum of squared ridge regression + λ x the slope2

From the graph, we can see that the ridge regression line and least-squares do not fit each other as well as the training data. We can say that the Least Squares has lower Bias than Ridge Regression. However, due to the small Bias, you will see a huge drop in the variance of the ridge regression.

At this point in the graph, we can understand that we can get long term prediction by starting with a little worse Ridge regression. This can be a problem. Now let’s consider the equation again:

The sum of squared ridge regression +λx the slope2

Now, if we remove the square on the slope, we take the absolute value, we will find Lasso Regression.

The sum of squared ridge regression + λ x │the slope│

Lasso regression also has little Bias, just like Ridge Regression but has less Variance than Least Squared. Both these types of regressions look similar and perform the same function of making the size of the training data less sensitive. Furthermore, you can apply both the regression for the same purpose.

How Ridge and Lasso Regression are Different

To understand the difference between the Ridge and Lasso Regression, we need to get back to the two-sample training data and increase the lambda.

The sum of squared ridge regression + λ x │the slope│

Which is the same as minimizing the sum of squares with constraint Σ |Bj≤ s. Some of the βs are shrunk to exactly zero, resulting in a regression model that’s easier to interpret.

A tuning parameter, λ controls the strength of the L1 penalty. λ is basically the amount of shrinkage:

When λ = 0, no parameters are eliminated. The estimate is equal to the one found with linear regression.

As λ increases, more and more coefficients are set to zero and eliminated (theoretically, when λ = ∞, all coefficients are eliminated).

As λ increases, bias increases.

As λ decreases, variance increases.

Conclusion

From the above explanation, we can understand that the Lasso Regression can eliminate the useless variables from the equation. This type of regression is better than Ridge regression and helps in reducing the Variances in a machine learning model that contains a lot of Variances.