Analyzing Factors Affecting Cervical Cancer Risk: A Data-Driven Approach

This project explores the relationship between various factors, such as sexual health, smoking, and medical history, to understand their impact on cervical cancer risk. By leveraging data analysis and statistical modeling, the goal is to identify significant predictors and develop a predictive model to assist in preventive healthcare efforts.

Introduction

Cervical cancer is a leading health concern globally, but it is also one of the most preventable cancers with early diagnosis and appropriate intervention. This project investigates the relationship between lifestyle, sexual health, and medical history factors to derive actionable insights and develop a predictive model to assist in preventive healthcare efforts.

Dataset

Source: Kaggle - Cervical Cancer Risk Classification Dataset
Size: Approximately 858 rows and 36 columns
Description: The dataset includes demographic, lifestyle, and medical history data, with a target variable (Biopsy) indicating whether cervical cancer was diagnosed.

Objectives

Perform data cleaning and preprocessing to handle missing or invalid values.
Conduct exploratory data analysis (EDA) to identify patterns and trends.
Evaluate correlations between variables and the target (Biopsy).
Develop a predictive model using logistic regression to assess cancer risk.
Provide actionable insights for healthcare professionals.

Tools and Libraries

Python Libraries:
- pandas, numpy: Data manipulation and preprocessing
- matplotlib, seaborn: Data visualization
- scikit-learn: Statistical modeling and machine learning
Power BI: For advanced data visualization

Analysis Workflow

Data Cleaning
- Handle missing values and outliers.
- Create new features, such as Smoking Severity.
Exploratory Data Analysis
- Analyze distributions of key variables.
- Examine relationships between lifestyle factors, medical history, and cervical cancer.
Correlation Analysis
- Use correlation heatmaps to identify significant relationships between predictors and the target variable.
Predictive Modeling
- Train a logistic regression model to predict cervical cancer risk.
- Evaluate the model using metrics such as accuracy, precision, and recall.
Data Visualization in Power BI
- Create visualizations like scatter plots, bar charts, and heatmaps for insights.

Insights

-Demographics: Age and smoking habits are critical factors influencing cancer risk. -STD Impact: Certain STDs(Genital Herpes, HIV, Condylomatosis, Vulvo-Perineal Condylomatosis) particularly with early and recurrent diagnosis, significantly increase risk. -Contraceptive Use: Limited correlations observed between contraceptive use and cancer outcomes. -High-Risk Profiles: Smoking severity and multiple STD diagnoses highlight individuals at higher risk. -Correlations: Age, smoking, and sexual health variables are strongly linked to positive biopsy results.

Results

Key predictors of cervical cancer include smoking history, contraceptive use, and STDs.
Strong correlations between lifestyle and medical history factors were identified.
The logistic regression model demonstrated high predictive performance, highlighting its potential for healthcare applications.

How to Run the Project

Clone this repository:

git clone https://github.com/yourusername/Analyzing-Factors-Affecting-Cervical-Cancer-Risk.git

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
cervical cancer analytics.pbix		cervical cancer analytics.pbix
kag_risk_factors_cervical_cancer.csv		kag_risk_factors_cervical_cancer.csv
risk_factors_cervical_cancer.py		risk_factors_cervical_cancer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Factors Affecting Cervical Cancer Risk: A Data-Driven Approach

Table of Contents

Introduction

Dataset

Objectives

Tools and Libraries

Analysis Workflow

Insights

Results

How to Run the Project

About

Releases

Packages

Languages

alice-patrick/Analyzing-Factors-Affecting-Cervical-Cancer-Risk-A-Data-Driven-Approach

Folders and files

Latest commit

History

Repository files navigation

Analyzing Factors Affecting Cervical Cancer Risk: A Data-Driven Approach

Table of Contents

Introduction

Dataset

Objectives

Tools and Libraries

Analysis Workflow

Insights

Results

How to Run the Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages