Breast-Cancer-Clustering-Analysis

Description: Clustering analysis of the Breast Cancer Wisconsin Dataset using hierarchical and k-means algorithms, with performance evaluation through R² and error rate calculations to diagnose breast cancer.

INTRODUCTION

This project utilizes the "Breast Cancer Wisconsin (Diagnostic) Dataset" to explore clustering techniques and evaluate their performance in diagnosing breast cancer. Initially, the dataset was preprocessed by transforming the diagnosis column into numeric values and removing unnecessary columns with NA values. The data was normalized and visualized, providing insights into the distribution of features. The primary focus of this analysis is applying different clustering algorithms, such as hierarchical clustering and k-means clustering, using both Euclidean and Manhattan distances. To assess the quality of clustering, a goodness-of-fit function (R²) and error rate calculations were employed. The results compare different numbers of clusters (K=2, 3, and 4), helping determine the optimal clustering configuration for classifying breast cancer diagnoses.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Cancer clustering.R		Cancer clustering.R
LICENSE		LICENSE
README.md		README.md
cancer_data.csv		cancer_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast-Cancer-Clustering-Analysis

About

Releases

Packages

Languages

License

tayayounan/Breast-Cancer-Clustering-Analysis

Folders and files

Latest commit

History

Repository files navigation

Breast-Cancer-Clustering-Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages