The objective functions of reguliarized models are the same as for OLS except they have a penalty term. Hence, it becomes


For Ridge Regression the additional penalty term is The loss function becomes

+ +
+indexing and notation +
  • The index refers to the number of observations. is the actual ‘target’ value of * the observation. is the predicted value for the observation.
  • +
  • The index refers to the number of predictors.
  • +
  • is the coefficient of the predictors
  • +
  • is the Ridge Penalty hyper-parameter. Note that when is 0, there is no more Regularized Regression and it becomes just a normal OLS regression.
  • +

can take any real values from to . As increases, it will forces the toward 0 in order to minimize the loss function.

Regression models

When modeling for regression, we somehow measure the distance between our prediction and the actual observed value. When comparing models, we usually want to keep the model which give the smallest sum of distance.


IT has to be noted that quite a few of these concepts have deeper connections in ML as they are not only ‘success’ metrics but also loss functions of ML algorithms.


This is probably the most well-known measure when comparing regression models. Because we are squaring the distance between the predicted and the observed, this penalizes predicted values that are far off the real values. Hence this measures is used when we want to avoid ‘outliers’ predictions (prediction that are far off.)


The SSE (aka sum of square error, aka without square root and average) is also the loss function in the linear regression algorithm. It is a convex function.


  • differentiable everywhere (even at the junction of the MAE and MSE). Meanings it can be used with Gradient Descent algorithms as well.
  • The transition from quadratic to linear behaviour in Huber loss results in a smoother optimization landscape compared to MSE. This can prevent issues related to gradient explosions and vanishing gradients, which may occur in certain cases with MSE.
  • -

    The main disadventage of the Huber Loss function is how to tune that parameters.


    The main disadvantage of the Huber Loss function is how to tune that parameters.

    @@ -3891,16 +3917,16 @@ font-style: inherit;">FALSE) background-color: null; font-style: inherit;">library(tidymodels)
    One of the very first ML algorithm (because of its ease) I expose is KNN. In this post, we’ll learn about KNN using Python (with the Sklearn package) and using R with packages from the tidymodel framework.


    KNN stands for K Nearest Neighbor.