Automobile
Proejct Link:https://nbviewer.jupyter.org/github/shoaib555/Unsupervised-Learning/blob/main/uscar.ipynb
The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes
The data concerns city-cycle fuel consumption in miles per gallon
- mpg: continuous
- cylinders(cyl): multi-valued discrete
- displacement(disp): continuous
- horsepower(hp): continuous
- weight(wt): continuous
- acceleration(acc): continuous
- model year(yr): multi-valued discrete
- origin: multi-valued discrete
- car name: string (unique for each instance)
Goal is to cluster the data and treat them as individual datasets to train Regression models to predict ‘mpg’
Manufacturing
Project Link:https://nbviewer.jupyter.org/github/shoaib555/Unsupervised-Learning/blob/main/wine.ipynb
Company X curates and packages wine across various vineyards spread throughout the country.
The data concerns the chemical composition of the wine and its respective quality.
- A, B, C, D: specific chemical composition measure of the wine
- Quality: quality of wine [ Low and High ]
Goal is to build a synthetic data generation model using the existing data provided by the company.
Automobile
Project Link:https://nbviewer.jupyter.org/github/shoaib555/Unsupervised-Learning/blob/main/vehicle.ipynb
The purpose is to classify a given silhouette as one of three types of vehicle, using a set of features extracted from the silhouette.The vehicle may be viewed from one of many different angles.
The data contains features extracted from the silhouette of vehicles in different angles. Four "Corgie" model vehicles
were used for the experiment: a double decker bus, Cheverolet van, Saab 9000 and an Opel Manta 400 cars. This particular combination of vehicles was chosen with the expectation that the bus, van and either one of the cars would be readily distinguishable, but it would be more difficult to distinguish between the cars.
All the features are numeric i.e. geometric features extracted from the silhouette.
Apply dimensionality reduction technique – PCA and train a model using principal components instead of training the model using just the raw data.
Sports management
Project Link:https://nbviewer.jupyter.org/github/shoaib555/Unsupervised-Learning/blob/main/IPL.ipynb
Company X is a sports management company for international cricket.
The data is collected belongs to batsman from IPL series conducted so far. Attribute Information:
- Runs: Runs score by the batsman
- Ave: Average runs scored by the batsman per match
- SR: strike rate of the batsman
- Fours: number of boundary/four scored
- Six: number of boundary/six scored
- HF: number of half centuries scored so far
Goal is to build a data driven batsman ranking model for the sports management company to make business decisions.
Project Link:https://nbviewer.jupyter.org/github/shoaib555/Unsupervised-Learning/blob/main/image.ipynb
Questions:
- List of all possible dimensionality reduction techniques that can be implemented using python.
- Dimensionality reduction illustration on Text Data