This repo contains projects that i have worked on for the Udacity Data Scientist Nanodegree program
This work aims to follow the CRISP-DM process, exhibit my technical abilities, and communicate insights in the form of a blog post. I chose this topic as personal interest to better understand factors that influence a well run business and methods to influence consumer behavior.
Click here to view project and all corresponding files and code.
This project focused on the end to end process in Data Science whereby you build an ETL, NLP and machine learning pipelines to categorise emergency service messages based on needs sent by the victim. This project is provided by Appen to build the model for the application to classify the disaster messages
Click here to view project and all corresponding files and code.
For this project you will analyse the interactions that users have with articles on the IBM Watson Studio platform, and make recommendations to them about new articles you think they will like. In order to determine which articles to show to each user, you will be performing a study of the data available on the IBM Watson Studio platform.
Click here to view project and all corresponding files and code.
Using PySpark to predict customer churn for a music streaming service. The project involved:
- Loading and cleaning a small subset (128MB) of a full dataset available (12GB)
- Conducting Exploratory Data Analysis to understand the data and what features are useful for predicting churn
- Feature Engineering to create features that will be used in the modelling process
- Modelling using machine learning algorithms such as Logistic Regression, Random Forest, Gradient Boost
Click here to view project and all corresponding files and code.