Skip to content

Clustering analysis of the Breast Cancer Wisconsin Dataset using hierarchical and k-means algorithms, with performance evaluation through R² and error rate calculations to diagnose breast cancer.

License

Notifications You must be signed in to change notification settings

tayayounan/Breast-Cancer-Clustering-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Breast-Cancer-Clustering-Analysis

Description: Clustering analysis of the Breast Cancer Wisconsin Dataset using hierarchical and k-means algorithms, with performance evaluation through R² and error rate calculations to diagnose breast cancer.

INTRODUCTION

This project utilizes the "Breast Cancer Wisconsin (Diagnostic) Dataset" to explore clustering techniques and evaluate their performance in diagnosing breast cancer. Initially, the dataset was preprocessed by transforming the diagnosis column into numeric values and removing unnecessary columns with NA values. The data was normalized and visualized, providing insights into the distribution of features. The primary focus of this analysis is applying different clustering algorithms, such as hierarchical clustering and k-means clustering, using both Euclidean and Manhattan distances. To assess the quality of clustering, a goodness-of-fit function (R²) and error rate calculations were employed. The results compare different numbers of clusters (K=2, 3, and 4), helping determine the optimal clustering configuration for classifying breast cancer diagnoses.

About

Clustering analysis of the Breast Cancer Wisconsin Dataset using hierarchical and k-means algorithms, with performance evaluation through R² and error rate calculations to diagnose breast cancer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages