ombhojane · seemab24 · Oct 15, 2024 · Oct 15, 2024 · Oct 16, 2024 · Oct 20, 2024
diff --git a/.ipynb_checkpoints/requirements-checkpoint.txt b/.ipynb_checkpoints/requirements-checkpoint.txt
@@ -0,0 +1,17 @@
+numpy
+pandas
+scikit-learn
+shap
+matplotlib
+seaborn
+plotly
+ipywidgets
+lime
+reportlab
+google-generativeai
+python-dotenv
+scipy
+pillow
+xgboost==1.5.1
+colorama
+dask
diff --git a/examples/explainability_comparison/SHAP_Logistic Regression.png b/examples/explainability_comparison/SHAP_Logistic Regression.png
diff --git a/examples/explainability_comparison/SHAP_Random Forest.png b/examples/explainability_comparison/SHAP_Random Forest.png
diff --git a/examples/explainability_comparison/correlation_heatmap.png b/examples/explainability_comparison/correlation_heatmap.png
diff --git a/examples/explainability_comparison/explaianble.ipynb b/examples/explainability_comparison/explaianble.ipynb
diff --git a/examples/explainability_comparison/feature_importance.png b/examples/explainability_comparison/feature_importance.png
diff --git a/examples/explainability_comparison/learning_curve.png b/examples/explainability_comparison/learning_curve.png
diff --git a/examples/explainability_comparison/partial_dependence.png b/examples/explainability_comparison/partial_dependence.png
diff --git a/examples/explainability_comparison/precision_recall_curve.png b/examples/explainability_comparison/precision_recall_curve.png
diff --git a/examples/explainability_comparison/readme.md b/examples/explainability_comparison/readme.md
@@ -0,0 +1,74 @@
+Explainability Comparison Project 
+📝 Introduction
+
+This project focuses on the comparison of various explainability techniques used on machine learning models. The most recent updates include detailed comparisons using SHAP and LIME explainability tools, feature importance normalization, and side-by-side visualizations of model explanations.
+
+
+🔄 Changes Implemented 
+
+Key Changes Implemented
+The following key modifications and additions were made to enhance the project’s explainability comparisons:
+
+1. Dataset Splitting and Scaling:
+
+
+
+train_test_split() was used to split the dataset into training and testing sets (80/20 split).
+StandardScaler was used to normalize and scale the feature data. This step is essential for models such as Logistic Regression, which require feature scaling for optimal performance.
+
+
+2. Model Comparison:
+
+Two models are compared in this project:
+Random Forest Classifier
+Logistic Regression
+Cross-validation was performed for both models to evaluate their generalization performance on the scaled dataset.
+
+
+3. SHAP (SHapley Additive exPlanations) Analysis:
+
+
+SHAP was used to provide feature importance for each model. The SHAP summary plots visualize which features contribute most to the model predictions.
+A custom function, compare_shap_values(), was created to:
+Automatically use the appropriate SHAP explainer depending on the model type (TreeExplainer for tree-based models, and KernelExplainer or LinearExplainer for other models).
+Generate and save SHAP summary plots for each model to visualize feature importance.
+SHAP Summary Plots: These plots are saved as PNG files for easy comparison across models.
+
+
+
+4. LIME (Local Interpretable Model-Agnostic Explanations) Comparison:
+
+
+
+A new function, compare_lime_explanations(), was added to compare LIME explanations for different models.
+LIME provides local explanations for individual predictions by approximating the model decision boundary in the local region of the instance.
+LIME explanations were generated for a specific instance, showing which features contributed most to the prediction for both models.
+
+
+
+5. Feature Importance Normalization:
+
+
+
+A function, extract_feature_importances(), was added to extract feature importances from the models and normalize them across different models.
+This allows for a side-by-side comparison of how different models weigh the importance of the same features.
+The normalized feature importances are displayed in a tabular format, which makes it easier to compare feature contributions across models.
+
+
+6. Side-by-Side SHAP Visualizations:
+
+
+SHAP values are compared across models for the same instances, and side-by-side SHAP summary plots were created.
+These plots help users understand how different models interpret the same features.
+To reduce memory usage, a sample of 100 data points was selected for SHAP analysis.
+Each SHAP plot is saved as a PNG image for easy reference and comparison between models.
+
+
+
+
+📊 Results
+The project explores how various explainability techniques explain model decisions. Here are the key results:
+
+SHAP: SHAP value plots indicate which features have the most impact on the predictions.
+LIME: LIME provides local interpretability, showing how small changes in input data affect predictions.
+Model Comparison: The project includes a report comparing the performance and interpretability of different models.
diff --git a/examples/explainability_comparison/roc_curve.png b/examples/explainability_comparison/roc_curve.png
diff --git a/notebooks/.ipynb_checkpoints/Pytorch_Support_Explainable-checkpoint.ipynb b/notebooks/.ipynb_checkpoints/Pytorch_Support_Explainable-checkpoint.ipynb
diff --git a/notebooks/explaianble.ipynb → ...oints/explaianble(0_1_6)-checkpoint.ipynb b/notebooks/explaianble.ipynb → ...oints/explaianble(0_1_6)-checkpoint.ipynb
diff --git a/notebooks/.ipynb_checkpoints/explaianble-checkpoint.ipynb b/notebooks/.ipynb_checkpoints/explaianble-checkpoint.ipynb
diff --git a/notebooks/Pytorch_Support_Explainable.ipynb b/notebooks/Pytorch_Support_Explainable.ipynb
@@ -1626,7 +1626,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": ".venv",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -1644,5 +1644,5 @@
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/notebooks/explaianble(0_1_6).ipynb b/notebooks/explaianble(0_1_6).ipynb
diff --git a/notebooks/model_comparison_report.html b/notebooks/model_comparison_report.html
@@ -0,0 +1,166 @@
+<html><head><title>Model Comparison Report</title></head><body><h1>Model Comparison Report</h1><h2>Feature Importance (Normalized)</h2><table border="1" class="dataframe">
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>Random Forest_Importance</th>
+      <th>Logistic Regression_Importance</th>
+    </tr>
+    <tr>
+      <th>Feature</th>
+      <th></th>
+      <th></th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>mean radius</th>
+      <td>0.048703</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean texture</th>
+      <td>0.013591</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean perimeter</th>
+      <td>0.053270</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean area</th>
+      <td>0.047555</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean smoothness</th>
+      <td>0.007285</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean compactness</th>
+      <td>0.013944</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean concavity</th>
+      <td>0.068001</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean concave points</th>
+      <td>0.106210</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean symmetry</th>
+      <td>0.003770</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>mean fractal dimension</th>
+      <td>0.003886</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>radius error</th>
+      <td>0.020139</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>texture error</th>
+      <td>0.004724</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>perimeter error</th>
+      <td>0.011303</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>area error</th>
+      <td>0.022407</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>smoothness error</th>
+      <td>0.004271</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>compactness error</th>
+      <td>0.005253</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>concavity error</th>
+      <td>0.009386</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>concave points error</th>
+      <td>0.003513</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>symmetry error</th>
+      <td>0.004018</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>fractal dimension error</th>
+      <td>0.005321</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst radius</th>
+      <td>0.077987</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst texture</th>
+      <td>0.021749</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst perimeter</th>
+      <td>0.067115</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst area</th>
+      <td>0.153892</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst smoothness</th>
+      <td>0.010644</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst compactness</th>
+      <td>0.020266</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst concavity</th>
+      <td>0.031802</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst concave points</th>
+      <td>0.144663</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst symmetry</th>
+      <td>0.010120</td>
+      <td>NaN</td>
+    </tr>
+    <tr>
+      <th>worst fractal dimension</th>
+      <td>0.005210</td>
+      <td>NaN</td>
+    </tr>
+  </tbody>
+</table><h2>SHAP Summary Plots</h2><p>SHAP summary plots have been generated in the notebook.</p><h2>LIME Explanations</h2><p>LIME explanations have been generated in the notebook for a specific instance.</p></body></html>
diff --git a/requirements.txt b/requirements.txt
@@ -12,6 +12,6 @@ google-generativeai
 python-dotenv
 scipy
 pillow
-xgboost
+xgboost==1.5.1
 colorama
 dask