Skip to content

Latest commit

 

History

History
68 lines (49 loc) · 5.57 KB

File metadata and controls

68 lines (49 loc) · 5.57 KB

Clustering Algorithm Implementation and Visualization from Scratch with Python

Overview

This project implements four popular clustering algorithms from scratch in Python, designed to work for datasets with d >= 2 dimensions and k >= 2 clusters. The implementations are tested on 2D datasets and compared visually with scikit-learn's implementations to evaluate correctness and performance.

Implemented Clustering Algorithms

  1. K-Means Clustering
  2. Gaussian Mixture Model (GMM) using Expectation-Maximization (EM)
  3. Mean-Shift Clustering
  4. Agglomerative Clustering

Python Implementations

  • KMeans.py: K-Means clustering.
  • KMeans_Ver0.py: K-Means clustering (2nd version).
  • GaussianMM.py: EM-GMM.
  • GaussianMM_Ver0.py: EM-GMM with functions of AIC, BIC and predict (2nd version).
  • MeanShift.py: Mean-Shift clustering.
  • Agglomerative.py: Agglomerative clustering.

Evaluations and Tests

  • test_2d_visualization.py:
    Tests each implementation on 2D datasets with visualization, comparing the results to scikit-learn's equivalent algorithms.
  • data_2d_test/:
    Contains the datasets used for testing.
  • test_2d_visualization_results/:
    Stores the output images of the clustering results.

Visualization Results

Blobs Dataset

Algorithm My Implementation Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift

Moons and Stars Dataset

Algorithm My Implementation Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift

Sticks Dataset

Algorithm My Implementation Scikit-learn
Agglomerative
EM-GMM
K-Means
Mean-Shift