Clustering Algorithm Implementation and Visualization from Scratch with Python

Overview

This project implements four popular clustering algorithms from scratch in Python, designed to work for datasets with d >= 2 dimensions and k >= 2 clusters. The implementations are tested on 2D datasets and compared visually with scikit-learn's implementations to evaluate correctness and performance.

Implemented Clustering Algorithms

K-Means Clustering
Gaussian Mixture Model (GMM) using Expectation-Maximization (EM)
Mean-Shift Clustering
Agglomerative Clustering

Python Implementations

KMeans.py: K-Means clustering.
KMeans_Ver0.py: K-Means clustering (2nd version).
GaussianMM.py: EM-GMM.
GaussianMM_Ver0.py: EM-GMM with functions of AIC, BIC and predict (2nd version).
MeanShift.py: Mean-Shift clustering.
Agglomerative.py: Agglomerative clustering.

Evaluations and Tests

test_2d_visualization.py:
Tests each implementation on 2D datasets with visualization, comparing the results to scikit-learn's equivalent algorithms.
data_2d_test/:
Contains the datasets used for testing.
test_2d_visualization_results/:
Stores the output images of the clustering results.

Visualization Results

Blobs Dataset

Algorithm	My Implementation	Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift

Moons and Stars Dataset

Algorithm	My Implementation	Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift

Sticks Dataset

Algorithm	My Implementation	Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Clustering Algorithm Implementation and Visualization from Scratch with Python

Overview

Implemented Clustering Algorithms

Python Implementations

Evaluations and Tests

Visualization Results

Blobs Dataset

Moons and Stars Dataset

Sticks Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

Clustering Algorithm Implementation and Visualization from Scratch with Python

Overview

Implemented Clustering Algorithms

Python Implementations

Evaluations and Tests

Visualization Results

Blobs Dataset

Moons and Stars Dataset

Sticks Dataset