Skip to content

This repo contains projects that I have worked on for the Udacity Data Scientist Nanodegree program

Notifications You must be signed in to change notification settings

MohauMasukela/Udacity-Data-Science-Nanodegree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Udacity Data Science Nanodegree

This repo contains projects that i have worked on for the Udacity Data Scientist Nanodegree program

Project 1: Introduction to Data Science - Consumer Behavior and Shopping Habits

This work aims to follow the CRISP-DM process, exhibit my technical abilities, and communicate insights in the form of a blog post. I chose this topic as personal interest to better understand factors that influence a well run business and methods to influence consumer behavior.

Click here to view project and all corresponding files and code.

Project 2: Data Engineering - Disaster Response Pipeline

This project focused on the end to end process in Data Science whereby you build an ETL, NLP and machine learning pipelines to categorise emergency service messages based on needs sent by the victim. This project is provided by Appen to build the model for the application to classify the disaster messages

Click here to view project and all corresponding files and code.

Project 3: Experimental Design & Recommendations - Recommendations with IBM

For this project you will analyse the interactions that users have with articles on the IBM Watson Studio platform, and make recommendations to them about new articles you think they will like. In order to determine which articles to show to each user, you will be performing a study of the data available on the IBM Watson Studio platform.

Click here to view project and all corresponding files and code.

Project 4: Data Scientist Capstone Project

Using PySpark to predict customer churn for a music streaming service. The project involved:

  • Loading and cleaning a small subset (128MB) of a full dataset available (12GB)
  • Conducting Exploratory Data Analysis to understand the data and what features are useful for predicting churn
  • Feature Engineering to create features that will be used in the modelling process
  • Modelling using machine learning algorithms such as Logistic Regression, Random Forest, Gradient Boost

Click here to view project and all corresponding files and code.

Medium Blog

About

This repo contains projects that I have worked on for the Udacity Data Scientist Nanodegree program

Resources

Stars

Watchers

Forks