diff --git a/docs/blog.xml b/docs/blog.xml index 4a89097..85a7c69 100644 --- a/docs/blog.xml +++ b/docs/blog.xml @@ -12,7 +12,7 @@ quarto-1.3.450 Sun, 21 Apr 2024 17:00:00 GMT - Lasso and Ridge Regressions + Regularized Regressions Francois de Ryckel https://fderyckel.github.io/blog.html/posts/machine-learning-part1/04-lasso-ridge/index.html Linear models obtained with minimizing the SSR (Sum of Square Residuals) are great and easy to grasp. However, rarely all conditions are met and/or as the number of predictors increased, conditions of linear regression start to break: multicollinearity between variables, breaking of homoskedasticity, etc.) To address these issues, we introduce regularized regression where the coefficient of the predictors (aka the estimated coefficient) received a given penalty. The goal of that penalty is to reduce the variance of the model (with many predictors models tends to overfit the data and performed poorly on test data).

+

The objective functions of reguliarized models are the same as for OLS except they have a penalty term. Hence, it becomes

+

For Ridge Regression the additional penalty term is The loss function becomes

+
+
+
+ +
+
+indexing and notation +
+
+
+
    +
  • The index refers to the number of observations. is the actual ‘target’ value of * the observation. is the predicted value for the observation.
    +
  • +
  • The index refers to the number of predictors.
    +
  • +
  • is the coefficient of the predictors
  • +
  • is the Ridge Penalty hyper-parameter. Note that when is 0, there is no more Regularized Regression and it becomes just a normal OLS regression.
  • +
+
+
+

can take any real values from to . As increases, it will forces the toward 0 in order to minimize the loss function.

@@ -44,10 +68,12 @@

Regression models

When modeling for regression, we somehow measure the distance between our prediction and the actual observed value. When comparing models, we usually want to keep the model which give the smallest sum of distance.

+

IT has to be noted that quite a few of these concepts have deeper connections in ML as they are not only ‘success’ metrics but also loss functions of ML algorithms.

RMSE

This is probably the most well-known measure when comparing regression models. Because we are squaring the distance between the predicted and the observed, this penalizes predicted values that are far off the real values. Hence this measures is used when we want to avoid ‘outliers’ predictions (prediction that are far off.)

+

The SSE (aka sum of square error, aka without square root and average) is also the loss function in the linear regression algorithm. It is a convex function.

MAE

@@ -67,7 +93,7 @@
  • differentiable everywhere (even at the junction of the MAE and MSE). Meanings it can be used with Gradient Descent algorithms as well.
  • The transition from quadratic to linear behaviour in Huber loss results in a smoother optimization landscape compared to MSE. This can prevent issues related to gradient explosions and vanishing gradients, which may occur in certain cases with MSE.
  • -

    The main disadventage of the Huber Loss function is how to tune that parameters.

    +

    The main disadvantage of the Huber Loss function is how to tune that parameters.

    @@ -3891,16 +3917,16 @@ font-style: inherit;">FALSE) background-color: null; font-style: inherit;">library(tidymodels)
    -
    ── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
    +
    ── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
    -
    ✔ broom        1.0.5     ✔ rsample      1.2.0
    -✔ dials        1.2.0     ✔ tibble       3.2.1
    -✔ infer        1.0.5     ✔ tidyr        1.3.0
    -✔ modeldata    1.2.0     ✔ tune         1.1.2
    -✔ parsnip      1.1.1     ✔ workflows    1.1.3
    -✔ purrr        1.0.2     ✔ workflowsets 1.0.1
    -✔ recipes      1.0.9     ✔ yardstick    1.2.0
    +
    ✔ broom        1.0.5      ✔ rsample      1.2.1 
    +✔ dials        1.2.1      ✔ tibble       3.2.1 
    +✔ infer        1.0.7      ✔ tidyr        1.3.1 
    +✔ modeldata    1.3.0      ✔ tune         1.2.1 
    +✔ parsnip      1.2.1      ✔ workflows    1.1.4 
    +✔ purrr        1.0.2      ✔ workflowsets 1.1.0 
    +✔ recipes      1.0.10     ✔ yardstick    1.3.1 
    ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
    @@ -3909,7 +3935,7 @@ font-style: inherit;">library(tidymodels)
    ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() -• Use suppressPackageStartupMessages() to eliminate package startup messages +• Learn how to get started at https://www.tidymodels.org/start/
    '2017-09-01')]
    -

    One of the very first ML algorithm (because it’s ease) to be exposed to is KNN. In this post, we’ll learn about KNN using Python (with the Sklearn package) and using R with packages from the tidymodel framework.

    +

    One of the very first ML algorithm (because of its ease) I expose is KNN. In this post, we’ll learn about KNN using Python (with the Sklearn package) and using R with packages from the tidymodel framework.

    Introduction

    KNN stands for K Nearest Neighbor.