Description: Clustering analysis of the Breast Cancer Wisconsin Dataset using hierarchical and k-means algorithms, with performance evaluation through R² and error rate calculations to diagnose breast cancer.
INTRODUCTION
This project utilizes the "Breast Cancer Wisconsin (Diagnostic) Dataset" to explore clustering techniques and evaluate their performance in diagnosing breast cancer. Initially, the dataset was preprocessed by transforming the diagnosis column into numeric values and removing unnecessary columns with NA values. The data was normalized and visualized, providing insights into the distribution of features. The primary focus of this analysis is applying different clustering algorithms, such as hierarchical clustering and k-means clustering, using both Euclidean and Manhattan distances. To assess the quality of clustering, a goodness-of-fit function (R²) and error rate calculations were employed. The results compare different numbers of clusters (K=2, 3, and 4), helping determine the optimal clustering configuration for classifying breast cancer diagnoses.