Skip to content

This article investigating the utilization of ML algorithms to forecast the diabetes onset on a global scale. It addresses challenges such as imbalanced data, feature selection, and model evaluation while emphasizing the importance of personalized care and accessibility. The study evaluates different classification and ensemble method

Notifications You must be signed in to change notification settings

kartikey-teotia/DIABETES-PREDICTION-USING-MACHINE-LEARNING

 
 

Repository files navigation

DIABETES-PREDICTION-USING-MACHINE-LEARNING

Welcome to the Diabetes Predicting Using Machine Learning reporsitoty . This project focuses on leveraging machine learning techniques to address various challenges and opportunities within the healthcare domain under this domain i have made Diabetes prediction.

image

Project Description

Introduction

Diabetes stands as a formidable global health challenge, affecting millions worldwide and posing significant risks to individuals and healthcare systems alike. Despite advancements in medical knowledge and technology, the prevalence of diabetes continues to rise, with projections indicating a steep trajectory in the coming years. Addressing this crisis demands innovative approaches that leverage the power of data analytics and machine learning to enhance prediction, diagnosis, and management.

This GitHub repository serves as a cornerstone in the quest to combat diabetes through predictive modeling and advanced analytics. Our mission is to harness the potential of machine learning techniques to develop a robust predictive model for diabetes. By analyzing diverse datasets encompassing factors such as medical history, genetic predispositions, lifestyle habits, and demographic variables, our project aims to unravel the intricate patterns underlying diabetes onset and progression.

In this introductory section, we delineate the scope and objectives of our endeavor, elucidate the methodologies and techniques that underpin our approach, and delineate the potential impact of our project on healthcare delivery and patient outcomes. By fostering collaboration, transparency, and innovation, this repository endeavors to foster a community-driven effort towards mitigating the burden of diabetes on a global scale.

Join us in this journey as we strive to transform diabetes management and prevention strategies, empowering healthcare professionals with actionable insights and personalized interventions to improve patient care and well-being. Together, we can usher in a new era of precision medicine and healthcare innovation, where data-driven approaches pave the way towards a healthier future for all.

Problem Description : A dataset is formed by taking into consideration some of the information of 700 approx individuals. The problem is : based on the given information about each individual we have to calculate that whether that individual will suffer from diabetes disease.

image

Dataset :

The Daibetes disease data set consists of patient data from Cleveland, Hungary, Long Beach and Switzerland. The combined dataset consists of 14 features and 916 samples with many missing values. The features used in here are,

  • Pregnancy : Number of times pregnant.
  • Glucose : Plasma glucose concentration is 2 hours in an oral glucose tolerance test.
  • Blood Pressure : Diastolic blood pressure (mm Hg).
  • Skin Thickness : Triceps skin fold thickness (mm).
  • Insulin : 2-Hour serum insulin (mu U/ml).
  • BMI(Body Mass Index) : BMI (weight in kg/(height inm)^2).
  • DPF : Diabetes pedigree function.
  • Age: Age (years).
  • Outcome: 1 = yes 0 = no.

Key Features:

  • Data Preprocessing: The dataset is carefully examined for missing values and duplicates, which are then handled appropriately.
  • Exploratory Data Analysis (EDA): A comprehensive analysis of the dataset is performed, including the visualization of feature distributions and correlation matrices.
  • Model Training: Several machine learning models such as Logistic Regression, Support Vector Machine and Random Forest are trained on the dataset.
  • Model Evaluation: The accuracy of each model is evaluated on both training and test datasets. Precision-Recall and ROC curves are also plotted for the Logistic Regression model to assess performance.
  • Graphical User Interface (GUI): A simple GUI application built using Tkinter allows users to input their medical data and receive predictions for heart disease.
  • GitHub Repository Structure: The repository is well-organized, containing code files, sample dataset, trained models, and a README file providing detailed instructions for usage and setup.

Technology Used :

  • Python Programming Language : The core programming language used for this project. Python is known for its simplicity and readability, making it an excellent choice for various applications.

  • Machine Learning : This project is mainly based on Machine Learning and its concepts . It helps in the extraction of keywords and generating a concise and brief report of the text .

Platform Used:

  • Jupyter Notebook : The code is written and executed in the web-based Python Jupyter Notebook .

Python Modules Used :

  • NumPy: Used for numerical computations and working with arrays or matrices efficiently.

  • Pandas: Utilized for data manipulation and analysis, particularly for loading datasets into DataFrame structures and performing operations like filtering, grouping, and merging

  • Scikit-learn: For machine learning tasks such as model training, evaluation, and preprocessing

  • Matplotlib: A plotting library for creating static, interactive, and animated visualizations in Python. It's used here for creating histograms, precision-recall curves, ROC curves, and bar charts.

  • Seaborn: Built on top of Matplotlib, Seaborn provides a high-level interface for drawing attractive and informative statistical graphics. It's used here for creating a correlation matrix heatmap.

  • Tkinter: A standard GUI toolkit in Python, used for creating simple desktop applications. It's employed here to create a basic user interface for inputting data and displaying prediction results.

Machine Learning Models:

  • Logistic Regression

  • Support Vector Machine

  • Random Forest

Graphical User Interface

image

Usage:

  • Clone the repository to your local machine.
  • Ensure you have the required Python libraries installed. You can install them using pip: pip install -r requirements.txt.
  • Run the main Python script (main.py) to execute the heart disease prediction system.
  • Follow the instructions provided by the GUI to input your medical data and obtain predictions.

This Project has been Created by-

Contributions:

Contributions to this project are welcome! If you find any bugs, have feature requests, or want to contribute enhancements, feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT and Apache 2.0 License - see the LICENSE file for details.

About

This article investigating the utilization of ML algorithms to forecast the diabetes onset on a global scale. It addresses challenges such as imbalanced data, feature selection, and model evaluation while emphasizing the importance of personalized care and accessibility. The study evaluates different classification and ensemble method

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%