Discriminately Boosted Clustering builds on DEC by using convolutional autoencoder instead of feed forward autoencoder. It uses the same training scheme, reconstruction loss and cluster assignment hardening loss as DEC. DBC achieves good results on image datasets because of its use of convolutional neural network.
To visualize the clusters, I used the Principal Component Analysis (PCA), to reduce the number of features in our data set we deployed PCA (Principal Component Analysis) which tries to find the best possible subspace. On the left we can observe PCA with 2 components and on the right PCA with 3 components
PCA - 2 componentsPCA - 3 components
T-SNE is mostly used to understand high-dimensional data and project it into low-dimensional space (like 2D or 3D). That makes it extremely useful when dealing with CNN networks.
T-SNE plot
In the Silhouette plot below we can see that the data instance is close to the center of the cluster and instances possessing the silhouette scores close to 0 are on the border between two clusters. .
Silhouette plotConfusion Matrix plot
Considering the hyperparameters epochs=100, the batch size= 256 and the validation size= 128, I obtained the following results.
- Accuracy = 59.102361 %
- Silhouette Score = 0.034447
- NMI = 0.002651