Machine Learning
- Course: COMP 4432-1, Class time: Mon, Wed 5- 6.50, Engineering and Computer Science - 410
- Instructor: Pooran Singh Negi, pooran.negi@gmail.com office 470, Office Hours: T, Th, 3.00 p.m. - 4.30 p.m. Email for 1-on-1 help.
- (Head TA: Lombe Chileshe (lombe.chileshe@du.edu), Office ECS 358, )
- TA: Daniel Parada(daniel.parada1@gmail.com): Office ECS 358, M, Tue, W 8-11 a.m )
- TA: Nidhi Madabhushi, (nidhi.madabhushi@du.edu) Office ECS 358, M, W, 3-5 p.m )
Credit: Content on this page contain links to various external resources and images form Kevin Murhopy book Machine Learning: a Probabilistic Perspective by Kevin Patrick Murphy.
- Linear algebra, probability, statistics,
- optimization and programming experience in python and its scientific libraries.
- linear algebra overview
- Read chapter 2 of Kevin Murphy for probabilty and statistics review or any other text you have used in the past
- Required:
- Machine Learning: a Probabilistic Perspective by Kevin Patrick Murphy. Online version is available in the DU library.
- Optional: An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. It is available online in pdf format
- Optional: Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville. It is available online.
- Optional: For more advance treatment The Elements of Statistical Learning. It is available online in pdf format
- Optional: Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 2nd Edition by Sebastian Raschka and Vahid ((Daniel Parada suggested this))
- Optional: Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron
- Essence of linear algebra by 3Blue1Brown (Daniel Parada suggested this)
- Linear algebra MIT OCW by Prof. Gilbert Strang
- Ali Ghodsi, Lec 2: Machine learning. classification, Linear and quadrtic discriminant analysis (provided by Logan)
- Generative vs Discriminant by Andrew Ng @ Stanford (provided by Logan)
- deep learning Standord
- CS231n: Convolutional Neural Networks for Visual Recognition Stanford
- Natural Language Processing with Deep Learning Standford
We will go through theory behind machine learning using tool from probability, linear algebra and optimization. We will use python, its scientific libraries (numpy, scipy, matplotlib, Pandas etc.) and scikit-learn: Machine Learning in Python during the course. For deep neural network part, we will use highly popular tensorflow Machine Intelligence library from the Google. For assignments, starter code or hint will be given. At the end of the course, one would have a unifying probabilistic perspective for most of the machine learning algorithms, be comfortable using open source tools for building machine learning systems.
There are couple of choice for running the code for this class Number 1 is the most straight forward option and supports lot of scientific python including tensorflow and keras for deep learning.
- Google colab. https://colab.research.google.com/notebooks/welcome.ipynb
- or Please install Anaconda Distribution. See the youtube link Installing Anaconda, Jupyter Notebook. For Deep Neural networks, we will go over tensorlfow and keras installation instruction later in the course.
- try python notebook online without installing anything
- Runs and visualizes your python code
- The Python Tutorial
This syllabus is subject to change at the discretion of the instructor Here are the main topics for the class. More topics can be added as per class interest and available time.
- Basic idea of machine learning, and probability
- Generative models, parametric estimation and supervised learning.
- Naive Bayes classifier etc.
- Gaussian models
- Linear and logistic regression
- Support vector machine, Kernels
- Decision tree.
- Probabilistic graphical model.
- Bias-Variance tradeoff and model selection etc.
- Ensemble methods, bagging and boosting
- Unsupervised learning
- Clustering, topic modelling etc.
- Deep learning
- Artificial Neural Networks(ANN), End to end learning, cost function
- Convolutional Neural Networks(CNN) for classification(image) and regression
- Recurrent Neural Networks for natural language processing(NLP) and time series data
- Generative adversarial networks (GANs)
There will be one mid term, a final exam, homework assignments, in class quizzes. A final machine learning related project and presentation will be due at the end of the quarter. We’ll drop one of your worst homework assignment and quiz grade. We’ll allow 2 late homework with cutoff of 36 hours. We’ll give
ceil(total_marks_obtained*exp(-(minutes late)/(24*60))) marks
for late submitted assignments via email.
Homework + Quizzes | 35(25 + 10) % |
---|---|
Midterm exam, Time 22 July, in class, close book and notes | 20% |
Final exam comprehensive, 14 th August, in class close books and notes | 27% |
I have to cancel extra class on Friday 16 th August | |
ML competition, notebook submission 17 August 11.59 p.m | 18% |
grade range [(‘A’, >=93), (‘A_minus’, >=89), (‘B_plus’, >=85), (‘B’, >=81), (‘B_minus’, >=77), (‘C_plus’, >=73), (‘C’, >=69), (‘C_minus’, >=65), (‘D_plus’, >61), (‘D’, >=57), (‘D_minus’, >=53), (‘F’, < 53)])
Please respect DU Honor Yourself, Honor the Code
quiz | sol |
---|---|
1 | sol |
2 | sol |
3 | sol |
Midterm | solution |
---|---|
practice midterm | sol |
Homework numbers are as per Kevin Murphy ebook from the library
Note that we will merge part a and b of homeworks to create a final grade for homeworks. i,e HW1a amd HW1b will be merged to create HW1 for recording final grade of HW1
HW | Due date | sol | ||
---|---|---|---|---|
1 | 1a | coding part: python_numpy questions | 3rd July 11.59 p.m | |
1 | 1b | written part: Problem numbers are from kevin murphy book. Use DU library version. | ||
submit written solution: Chapter 2, 2.1(use bayes rule, condition on event actually observed. | 5 th July 11.59 p.m | |||
like in part a say N_b = number of boys, N_g no of girls) (2 = 1+1 point), 2.3 (.5 point), 2.4(1 point), | ||||
2.6(1 = .5+.5 point), 2.16(1.5= .5+.5+.5 points) | ||||
Look for chapter 2 for definitions like section 2.2.4 for | ||||
Independence and conditional independence. Explain various steps in the work | ||||
2 | 2b | Chpater 2, 2.13 (1 point, hint: I(X,Y) = H(X) + H(Y) - H(X,Y)) | 12 th July 11.59 p.m | |
chapter 3, 3.6 (1 point), 3.7(1 point each), 3.11(.5 point each), 3.20(.5 point each), | ||||
2 | 2a | implementing naive bayes airlines sentiment | 22 th July 11.59 p.m | |
3 | 3a | implementating QDA notebook | 24 th July 11.59 p..m | |
3 | 3b | Q1 (2 point)- Prove that If |
20 th July 11.59 a.m | |
diagonal, then Gaussian discriminant analysis is equivalent to naive Bayes. | ||||
From the book 4.1 (1 point )(look into section 2.5.1 for definition of | ||||
correlation coefficient), 4.14(2 point .5 points each) | ||||
4.21(2 = 1 + 1 point ), 4.22(1 = .5+.5 point), | ||||
4 | a | linear ridge regression using tensorflow | 31 July 11.59 | |
4 | b | (2 points) From the book using equations 7.30, 7.31 derive equation 7.32(ridge regression) | 2 August 11.59 p.m | |
7.2 (1 point)(check the formula for W in the book. X transpose is missing) | ||||
7.4 (2 point), 7.9 (2=1.5+.5 points), 8.3(2 = .5 + 1.5 + 1 points ) | ||||
5 | a | tensorflow multiclass logistic regression | 8 th August 11.59 p.m | |
5 | b | LDA PCA | 10 th August 11.59 pm | |
ml competition notebook and sample code | 16 th August 11.59 p.m | |||
6 | a | HW6a SVM sklearn questions | 15 th August 11.59 p.m | |
6 | b | Written homework | 13 th August 11.59 | sol |
for first part consider 1x1 Gram matrix. | ||||
Find vector(can take any dimension >2 vectors x, y such that |
||||
Note = Dimension 1 vectors(scalar) is ok but trivial | ||||
Date | Required Reading assignment | uploaded slides/notebooks |
---|---|---|
24 June | Read chapter 1 of Kevin Murphy and Basic of probability from chapter 2 upto 2.4.1 and 2.4.6 | Review basic linear algebra, notion of do product and similarity. This is very fundamental and we’ll use it a lot. |
Detail Scipy Lecture Notes . Practice 1.3.1 and 1.3.2, 1.4.1 to 1.4.2.8 in Jupyter notebook | properties of vectors, matrices and connection between them, notion of linear combinations and spanned space. | |
Reviewed common discrete random variables. | ||
Review assignment about eigen value and vectors, SVD, positive definite matrices from your linear algebra notes. | ||
continuous distributions like normal, multi-variate normal, beta, dirichlet . | ||
26 June | section 2.2, 2.3, 2.4[.1, .2, .3, .4, .5, .6], 2.5[.1, .2, .4], 2.6.1, 2.8 of kevin Murphy | Basic machine learning categories. Generative classifiers. |
3.1-3.2.4 | Bayesian concept learning. | |
ml motivation notebook | ||
numpy basic notebook | ||
generative models notebook | ||
1 st July | information theory, beta dist, mle, map | |
3rd July | Rest of chapter 3 | MLE and MAP estimation of parameters, selection of prior |
Here is the link to mechanics of Lagrangian multiplier. For more detail see | ||
This link at metacademy. Go over free section. | ||
If you want to go over optimization theory in detail | ||
here is the link to the book by prof. Stephen Boyd and Lieven Vandenberghe. | ||
Checkout the Stanford related link. | ||
8 th July | k. M. book 4.1 upto 4.2.5 | |
MVN demo | ||
10 th July | Covered modelling class-conditional densities using multi-variate Gaussian distribution(Gaussian discriminant analysis, QDA,LDA) | |
Idea of decision boundary and discriminant function. | ||
polynomial fitting issue | ||
15 th July | K.M. book 7.1- 7.3.3, 7.5.1 | |
Started Linear model, MLE estimation of parameters. | ||
Here is the link to Psedo-Inverse I talked about Least Squares, Pseudo-Inverses SVD | ||
17 th July | 8.1, 8.2, 8.3.1-8.3.3 | Linear model. MAP estimate. Gaussian prior(Ridge regression), Laplace prior(LASSO). Geometric interpretation. |
How linear model can be extended to nonlinear model and polynomial fitting issue. | ||
Convex sets and functions. Started discriminative models(logistic model …) | ||
22 nd July | In class, close notebook Midterm | |
24 th July | Tensorflow overview | |
tensorflow examples | ||
Optional reading | ||
- Understanding Learning Rates and How It Improves Performance in Deep Learning | Visualizing the Loss Landscape of Neural Nets | |
- Snapshot Ensembles: Train 1, get M for free | ||
- Intro to optimization in deep learning: Momentum, RMSProp and Adam | Finished MLE for logistic regression, | |
Gradient descent, stochastic gradient descent | ||
mini-batch gradient descent in the context of convex and non-convex loss function optimization. | ||
Some issue like getting out of local minima and handling saddle points | ||
Started tensorflow for building machine learning models. | ||
29 th July | 8.3.6, 8.3.7, 8.6.3- 8.6.3.2 | Finished multiclass logistic regression, and PCA |
31 July | 12.2 | Finished Fisher LDA and and started kernel |
5th August | 8.6.3, 14.1 - 14.5 | KNN link. Covered kernel and classification and regression SVM. Please go through |
kernel ridge regression, kernel PCA and classification SVM in the book. | ||
Please read soft margin svm from book. | ||
This is the paper we talked about in the context of XOR problem. This paper is not related to the coursework. | ||
It is optional. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks | ||
7 th August | K. M. book 5.7 upto 5.7.2.2 | Bayes decision theory notebook |
Covered Bayesian decision theory, confusion matrix, issue with accuracy | ||
and idea of recall, precision and merging them(F1, Fb score) and ROC(AUC) curve | ||
12 th August | Bias Varaince tradeoff | |
ANN | ||
Look into these resources too | ||
Chapter 16 Adaptive basis function models(decision tree, Random forest, Boosting(AdaBoost), Ensemble learning) etc. | ||
Chapter 25 Clustering(Should be covered in data mining) | ||
Chapter 11 Mixture models and the EM algorithm (Can be covered in data mining) | ||
SVM dual, kernels and regression | ||
SVMs, Duality and the Kernel Trick | ||