Skip to content

Implementation of the machine learning algorithms from scratch using NumPy.

Notifications You must be signed in to change notification settings

blazeAssault26/ML-from-Scratch

Repository files navigation

Overview

Boosting is an ensemble learning technique aimed at improving model performance by iteratively combining weak learners, such as decision trees. Unlike traditional ensemble methods like bagging, boosting corrects errors made by previous models sequentially. This adaptive approach enables boosting to capture complex data relationships and achieve high predictive accuracy. This notebook demonstrates Gradient Boosting Machines (GBM), which improve predictions by fitting decision trees to the residuals of previous models. This iterative process refines predictions and minimizes errors. By adapting to data nuances through residual analysis, GBM enhances predictive power beyond individual weak learners. GBM employs additive modeling, where each new tree is added to the model to correct the errors of the previous trees, iteratively reducing the residuals and minimizing the error to zero over subsequent models.

Linear regression is a foundational statistical technique used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The objective is to find the best-fitting line that minimizes the sum of squared residuals between observed and predicted values. Mathematically, it involves estimating the coefficients ((β)) in the equation (𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ₙ𝑥ₙ + ϵ) using the Ordinary Least Squares (OLS) method. This method minimizes the sum of the squared differences between observed and predicted values, achieved by solving the normal equation (𝛽 = (𝑋ᵀ𝑋)⁻¹𝑋ᵀ𝑦) where X is the matrix of input features and 𝑦 is the vector of output values.This project demonstrates linear regression by implementing it from scratch and comparing it with the scikit-learn library. The custom implementation calculates the slope and intercept manually showcasing the underlying maths, while scikit-learn uses the efficient OLS method internally.

K-Nearest Neighbors (KNN) is a simple, non-parametric algorithm used for classification and regression. It memorizes the training dataset and does not assume any underlying distribution for the data, relying on the entire dataset to make predictions. As a lazy learner, KNN defers computation until a prediction is needed, leading to high computational costs during prediction. KNN uses Euclidean distance: dis(x₁, x₂) = √(∑(i=1 to n) (x₁ᵢ - x₂ᵢ)²) to find the closest neighbors and doesn't have an explicit training phase, storing the training data for use during the prediction phase. For classification, KNN identifies the 'k' nearest neighbors and assigns the most common class among them, while for regression, it averages the values of the 'k' nearest neighbors. KNN is sensitive to the curse of dimensionality, where the distance metric becomes less informative as the number of dimensions increases, and to outliers, which can significantly skew the results.

KMeans clustering is a fundamental unsupervised learning technique used to partition a dataset into distinct clusters based on similarity, aiming to minimize the variance within each cluster. The goal is to group data points such that those within the same cluster are more similar to each other than to those in different clusters. The algorithm involves initializing cluster centroids randomly, assigning each data point to the nearest centroid, and updating the centroids to the mean of the assigned points. This process is repeated iteratively until the centroids no longer change or a maximum number of iterations is reached. This project demonstrates KMeans clustering by implementing it from scratch and applying it to both synthetic data and the real-world dataset student_clustering.csv, providing a clear understanding of the algorithm.

About

Implementation of the machine learning algorithms from scratch using NumPy.

Topics

Resources

Stars

Watchers

Forks