Skip to content

The repository contains implementations of basic machine learning algorithms written in MATLAB.

License

Notifications You must be signed in to change notification settings

KrzysiekJa/ML-basics-theory

Repository files navigation

ML-basics-theory

Technology:
- Matlab m-scripts.
- jupyter notebooks.
Laboratories legend:
(1) Implementations of bacis algorithms and simulations.
(2) Basic usage of matrix operations (better performance).
(3) Optimization process on Restrigin function.
(4) Linear regression (LR) for the process of laminar cooling of a cylindrical sheet.
(5) Logistic regression (LogitR) for prediction of share of ferrite in steel after stepping down process.
(6) Same logistic regression with multiclass prediction.
(7) Same logistic regression with two-class prediction with two-class regularization.
(8) PCA for reduction of dynamic system simulation results' dimentions (5 to 2).
(9) Recommendation engine: completing movie ratings.
(10) Research 0: Project: LogitR and GD vs SGD vs Data sampling (on SPAM data).
(11) Research 1: (similar) LogitR and GD vs SGD vs Data sampling (on Bank Marketing data).

The first project has better description (also mathematical), but in polish, not english. The second one has been made on bigger dataset and has been better done.

Lab 4-6

After hot rolling, sheets with different phase composition are subjected to a laminar cooling process. Sheet sweetening scheme after hot rolling:

Lab 4

Result:

Lab 5

Creation of two models ( $\theta_1$ and $\theta_2$ ) capable of predicting whether the ferrite phase fraction is in the range 1: $F_f \in [0.7; 0.85]$ and is within the range 2: $F_f \in [0.7; 0.8]$.

Results:

Lab 6

One vs. All

Division into classes depending on the range to which the share of ferrite will belong after metal descent. The individual classes correspond to the following ranges:
Class 1: $F_f \in [0.8; 1]$
Class 2: $F_f \in [0.7; 0.8]$
Class 3: $F_f \in [0.6; 0.7]$
Class 4: $F_f \in [0.5; 0.6]$
Class 5: $F_f \in [0.4; 0.5]$
Class 6: $F_f \in [0.3; 0.4]$
Class 7: $F_f \in [0.2; 0.3]$
Class 8: $F_f \in [0.1; 0.2]$
Class 9: $F_f \in [0; 0.1]$

Result:

Confusion matrix for classification:

Lab 7

Regularization

Creating a model that can predict the belonging of the vector x to one of the two classes. The training of the model takes into account the regularization.

Results:

  • $\lambda = 1$:

  • $\lambda = 0.00001$:

F-score/lambda:

hypotheses written during runtime

Lab 8

The results from the simulation of a dynamic system consisting of five bogies were used.

Simulations were carried out for different values of the masses of individual bogies. The result of each simulation was the maximum deflection of the first trolley. There are 4 sets of results: $(X^1, Y^1)$, $(X^2, Y^2)$, $(X^3, Y^3)$, $(X^4, Y^4)$.

Principal component analyzes were carried out for each of the data sets, the number of dimensions was reduced to 2.

Sample results:

Lab 9

Collaborative filtering

The task was to complete the missing ratings and indicate to each user two products that had not yet been rated. The common filtration algorithm was used to fill in the missing assessments. The learning effect was a matrix $\Theta$ containing the preferences of individual users and a matrix 𝐗 containing the characteristics of individual products:

As a result, it allowed to complete the matrix Y with missing ratings.

$$Y = h_{\Theta}(X) = \Theta^T X$$

Research 0: Project

Gradient Descent (GD) vs Stochastic Gradient Descent (SGD) vs GD with Sampling

Source: https://archive.ics.uci.edu/ml/datasets/spambase

The project was prepared using SPAM E-mail Database (shape: 4601x58) data from 1998. The main purpose of the study was to compare the methods Gradient Descent (GD), Stochastic Gradient Descent and GD with Sampling, comparing their time efficiency.

Main problem with research: used dataset does not have a well-prepared classification labeling (misclassification error at least 7%).

Example plots showing their performance are put below.

  • final minimum value found:

  • the minimum value found depending on the number of iterations:

  • execution time for the number of executions (7):

The classification accuracy obtained for the attempts: ~ 68-70%.

Research 1: Project

Gradient Descent (GD) vs Stochastic Gradient Descent (SGD) vs GD with Sampling

Source: https://archive.ics.uci.edu/ml/datasets/bank+marketing

The project was prepared using Bank Marketing Data Set (shape: 41188x21) data from 2014. The main purpose of the study was to compare the methods Gradient Descent (GD), Stochastic Gradient Descent and GD with Sampling, comparing their time efficiency.

Short project view:
(more in the jupyter notebook)
Example plots showing their performance are put below.

  • final minimum value found:

  • the minimum value found depending on the number of iterations:

  • execution time for the number of executions (5):

The classification accuracy for different algorithms:

  • GD: ~77%,
  • SGD: ~89.8%,
  • Sampled GD: ~77.7%.

About

The repository contains implementations of basic machine learning algorithms written in MATLAB.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published