Skip to content

An algorithm to detect if a cancerous tumour is malignant or benign based on attributes data in the form of .CSV files.

Notifications You must be signed in to change notification settings

pratyusha-garaye/Cancer-Detection-using-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Breast Cancer Prediction using Scikit-Learn and Seaborn

1. Software used

  • Python with ML Libraries installed
  • VS Code Environmet

2. Required Dataset

Source - Kaggle

Attribute Information:

  1. ID number
  2. Diagnosis (M = malignant, B = benign)
  • (3 – 32) Ten real-valued features are computed for each cell nucleus:
  1. radius (mean of distances from center to points on the perimeter)
  2. texture (standard deviation of gray-scale values)
  3. perimeter
  4. area
  5. smoothness (local variation in radius lengths)
  6. compactness (perimeter^2 / area - 1.0)
  7. concavity (severity of concave portions of the contour)
  8. concave points (number of concave portions of the contour)
  9. symmetry
  10. fractal dimension ("coastline approximation" - 1)
  • The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.

  • All feature values are recoded with four significant digits.

  • Headers of the dataset:

Screenshot 2024-07-07 at 12 15 05 AM

3. EDA and Label Encoding

  • Exploratory data analysis was performed using Pandas. Column with missing values was dropped.
  • Categorical variable is converted to numerical values.

4. Cancer Prediction using Logistic Regression

  • Dataset is split into training and test set. 75% of the data was used for training while remaining 25% was used for test.

  • Logistic Regression package is imported from Scikit-Learn and applied to get prediction on the presence of cancer.

  • The predicted values is plotted as a heatmap of the Confusion Matrix using Seaborn Library to determine the number of Type I and Type II errors.

  • Confusion Matrix : Confusion Matrix

Calculated accuracy_score of the prediction: 97.89%

About

An algorithm to detect if a cancerous tumour is malignant or benign based on attributes data in the form of .CSV files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published