Breast Cancer Prediction using Scikit-Learn and Seaborn

1. Software used

Python with ML Libraries installed
VS Code Environmet

2. Required Dataset

Source - Kaggle

Attribute Information:

ID number
Diagnosis (M = malignant, B = benign)

(3 – 32) Ten real-valued features are computed for each cell nucleus:

radius (mean of distances from center to points on the perimeter)
texture (standard deviation of gray-scale values)
perimeter
area
smoothness (local variation in radius lengths)
compactness (perimeter^2 / area - 1.0)
concavity (severity of concave portions of the contour)
concave points (number of concave portions of the contour)
symmetry
fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius.
All feature values are recoded with four significant digits.
Headers of the dataset:

3. EDA and Label Encoding

Exploratory data analysis was performed using Pandas. Column with missing values was dropped.
Categorical variable is converted to numerical values.

4. Cancer Prediction using Logistic Regression

Dataset is split into training and test set. 75% of the data was used for training while remaining 25% was used for test.
Logistic Regression package is imported from Scikit-Learn and applied to get prediction on the presence of cancer.
The predicted values is plotted as a heatmap of the Confusion Matrix using Seaborn Library to determine the number of Type I and Type II errors.
Confusion Matrix :

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.DS_Store		.DS_Store
Confusion Matrix.png		Confusion Matrix.png
Data Headers.png		Data Headers.png
README.md		README.md
algorithm.ipynb		algorithm.ipynb
data.csv		data.csv
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast Cancer Prediction using Scikit-Learn and Seaborn

1. Software used

2. Required Dataset

Attribute Information:

3. EDA and Label Encoding

4. Cancer Prediction using Logistic Regression

Calculated accuracy_score of the prediction: 97.89%

About

Releases

Packages

Languages

pratyusha-garaye/Cancer-Detection-using-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Prediction using Scikit-Learn and Seaborn

1. Software used

2. Required Dataset

Attribute Information:

3. EDA and Label Encoding

4. Cancer Prediction using Logistic Regression

Calculated accuracy_score of the prediction: 97.89%

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages