- Matlab m-scripts.
- jupyter notebooks.
(1) Implementations of bacis algorithms and simulations.
(2) Basic usage of matrix operations (better performance).
(3) Optimization process on Restrigin function.
(4) Linear regression (LR) for the process of laminar cooling of a cylindrical sheet.
(5) Logistic regression (LogitR) for prediction of share of ferrite in steel after stepping down process.
(6) Same logistic regression with multiclass prediction.
(7) Same logistic regression with two-class prediction with two-class regularization.
(8) PCA for reduction of dynamic system simulation results' dimentions (5 to 2).
(9) Recommendation engine: completing movie ratings.
(10) Research 0: Project: LogitR and GD vs SGD vs Data sampling (on SPAM data).
(11) Research 1: (similar) LogitR and GD vs SGD vs Data sampling (on Bank Marketing data).
The first project has better description (also mathematical), but in polish, not english. The second one has been made on bigger dataset and has been better done.
After hot rolling, sheets with different phase composition are subjected to a laminar cooling process. Sheet sweetening scheme after hot rolling:
Creation of two models (
One vs. All
Division into classes depending on the range to which the share of ferrite will belong after metal descent. The individual classes correspond to the following ranges:
Class 1:
Class 2:
Class 3:
Class 4:
Class 5:
Class 6:
Class 7:
Class 8:
Class 9:
Confusion matrix for classification:
Regularization
Creating a model that can predict the belonging of the vector x to one of the two classes. The training of the model takes into account the regularization.
Results:
hypotheses written during runtime
The results from the simulation of a dynamic system consisting of five bogies were used.
Simulations were carried out for different values of the masses of individual bogies. The result of each simulation was the maximum deflection of the first trolley. There are 4 sets of results:
Principal component analyzes were carried out for each of the data sets, the number of dimensions was reduced to 2.
Collaborative filtering
The task was to complete the missing ratings and indicate to each user two products that had not yet been rated. The common filtration algorithm was used to fill in the missing assessments. The learning effect was a matrix
As a result, it allowed to complete the matrix Y with missing ratings.
Gradient Descent (GD) vs Stochastic Gradient Descent (SGD) vs GD with Sampling
Source: https://archive.ics.uci.edu/ml/datasets/spambase
The project was prepared using SPAM E-mail Database (shape: 4601x58) data from 1998. The main purpose of the study was to compare the methods Gradient Descent (GD), Stochastic Gradient Descent and GD with Sampling, comparing their time efficiency.
Main problem with research: used dataset does not have a well-prepared classification labeling (misclassification error at least 7%).
Example plots showing their performance are put below.
The classification accuracy obtained for the attempts: ~ 68-70%.
Gradient Descent (GD) vs Stochastic Gradient Descent (SGD) vs GD with Sampling
Source: https://archive.ics.uci.edu/ml/datasets/bank+marketing
The project was prepared using Bank Marketing Data Set (shape: 41188x21) data from 2014. The main purpose of the study was to compare the methods Gradient Descent (GD), Stochastic Gradient Descent and GD with Sampling, comparing their time efficiency.
Short project view:
(more in the jupyter notebook)
Example plots showing their performance are put below.
The classification accuracy for different algorithms:
- GD: ~77%,
- SGD: ~89.8%,
- Sampled GD: ~77.7%.