K-Means clustering aims to partition the data into k
clusters, where each data point belongs to the cluster with the nearest mean. It is a centroid-based algorithm that iteratively updates cluster centers to minimize the variance within each cluster.
- Requires the number of clusters
k
to be specified. - Sensitive to initial placement of centroids.
- Can converge to local minima.
Hierarchical Clustering builds a hierarchy of clusters either agglomeratively (bottom-up) or divisively (top-down). Agglomerative clustering starts with each data point as its own cluster and merges the closest pairs until only one cluster remains.
- Does not require the number of clusters to be specified initially.
- Can produce a dendrogram, which is a tree-like diagram of clusters.
- Computationally intensive for large datasets.
DBSCAN is a density-based clustering algorithm that can find arbitrarily shaped clusters and identify outliers. It groups together points that are closely packed together and marks points that are far away as outliers.
- Does not require the number of clusters to be specified.
- Requires two parameters:
eps
(maximum distance between points in a cluster) andmin_samples
(minimum number of points in a cluster). - Can handle noise and outliers effectively.
Mean Shift is a centroid-based algorithm that updates candidates for centroids to be the mean of the points within a given region. It does not require specifying the number of clusters in advance and can find the number of clusters automatically.
- Automatically determines the number of clusters.
- Computationally intensive for large datasets.
- Sensitive to the bandwidth parameter.
Gaussian Mixture Model (GMM) is a probabilistic model that assumes all the data points are generated from a mixture of several Gaussian distributions with unknown parameters. It uses the Expectation-Maximization (EM) algorithm to estimate the parameters.
- Can handle clusters of different shapes and sizes.
- Provides a probabilistic clustering.
- Requires the number of clusters
k
to be specified.