Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/explainability comparison #103

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .ipynb_checkpoints/requirements-checkpoint.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
numpy
pandas
scikit-learn
shap
matplotlib
seaborn
plotly
ipywidgets
lime
reportlab
google-generativeai
python-dotenv
scipy
pillow
xgboost==1.5.1
colorama
dask
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
76,862 changes: 76,862 additions & 0 deletions examples/explainability_comparison/explaianble.ipynb

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
74 changes: 74 additions & 0 deletions examples/explainability_comparison/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Explainability Comparison Project
📝 Introduction

This project focuses on the comparison of various explainability techniques used on machine learning models. The most recent updates include detailed comparisons using SHAP and LIME explainability tools, feature importance normalization, and side-by-side visualizations of model explanations.


🔄 Changes Implemented

Key Changes Implemented
The following key modifications and additions were made to enhance the project’s explainability comparisons:

1. Dataset Splitting and Scaling:



train_test_split() was used to split the dataset into training and testing sets (80/20 split).
StandardScaler was used to normalize and scale the feature data. This step is essential for models such as Logistic Regression, which require feature scaling for optimal performance.


2. Model Comparison:

Two models are compared in this project:
Random Forest Classifier
Logistic Regression
Cross-validation was performed for both models to evaluate their generalization performance on the scaled dataset.


3. SHAP (SHapley Additive exPlanations) Analysis:


SHAP was used to provide feature importance for each model. The SHAP summary plots visualize which features contribute most to the model predictions.
A custom function, compare_shap_values(), was created to:
Automatically use the appropriate SHAP explainer depending on the model type (TreeExplainer for tree-based models, and KernelExplainer or LinearExplainer for other models).
Generate and save SHAP summary plots for each model to visualize feature importance.
SHAP Summary Plots: These plots are saved as PNG files for easy comparison across models.



4. LIME (Local Interpretable Model-Agnostic Explanations) Comparison:



A new function, compare_lime_explanations(), was added to compare LIME explanations for different models.
LIME provides local explanations for individual predictions by approximating the model decision boundary in the local region of the instance.
LIME explanations were generated for a specific instance, showing which features contributed most to the prediction for both models.



5. Feature Importance Normalization:



A function, extract_feature_importances(), was added to extract feature importances from the models and normalize them across different models.
This allows for a side-by-side comparison of how different models weigh the importance of the same features.
The normalized feature importances are displayed in a tabular format, which makes it easier to compare feature contributions across models.


6. Side-by-Side SHAP Visualizations:


SHAP values are compared across models for the same instances, and side-by-side SHAP summary plots were created.
These plots help users understand how different models interpret the same features.
To reduce memory usage, a sample of 100 data points was selected for SHAP analysis.
Each SHAP plot is saved as a PNG image for easy reference and comparison between models.




📊 Results
The project explores how various explainability techniques explain model decisions. Here are the key results:

SHAP: SHAP value plots indicate which features have the most impact on the predictions.
LIME: LIME provides local interpretability, showing how small changes in input data affect predictions.
Model Comparison: The project includes a report comparing the performance and interpretability of different models.
Binary file added examples/explainability_comparison/roc_curve.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,648 changes: 1,648 additions & 0 deletions notebooks/.ipynb_checkpoints/Pytorch_Support_Explainable-checkpoint.ipynb

Large diffs are not rendered by default.

Large diffs are not rendered by default.

76,839 changes: 76,839 additions & 0 deletions notebooks/.ipynb_checkpoints/explaianble-checkpoint.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions notebooks/Pytorch_Support_Explainable.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1626,7 +1626,7 @@
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -1644,5 +1644,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
1,420 changes: 715 additions & 705 deletions notebooks/explaianble(0_1_6).ipynb

Large diffs are not rendered by default.

166 changes: 166 additions & 0 deletions notebooks/model_comparison_report.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
<html><head><title>Model Comparison Report</title></head><body><h1>Model Comparison Report</h1><h2>Feature Importance (Normalized)</h2><table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Random Forest_Importance</th>
<th>Logistic Regression_Importance</th>
</tr>
<tr>
<th>Feature</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>mean radius</th>
<td>0.048703</td>
<td>NaN</td>
</tr>
<tr>
<th>mean texture</th>
<td>0.013591</td>
<td>NaN</td>
</tr>
<tr>
<th>mean perimeter</th>
<td>0.053270</td>
<td>NaN</td>
</tr>
<tr>
<th>mean area</th>
<td>0.047555</td>
<td>NaN</td>
</tr>
<tr>
<th>mean smoothness</th>
<td>0.007285</td>
<td>NaN</td>
</tr>
<tr>
<th>mean compactness</th>
<td>0.013944</td>
<td>NaN</td>
</tr>
<tr>
<th>mean concavity</th>
<td>0.068001</td>
<td>NaN</td>
</tr>
<tr>
<th>mean concave points</th>
<td>0.106210</td>
<td>NaN</td>
</tr>
<tr>
<th>mean symmetry</th>
<td>0.003770</td>
<td>NaN</td>
</tr>
<tr>
<th>mean fractal dimension</th>
<td>0.003886</td>
<td>NaN</td>
</tr>
<tr>
<th>radius error</th>
<td>0.020139</td>
<td>NaN</td>
</tr>
<tr>
<th>texture error</th>
<td>0.004724</td>
<td>NaN</td>
</tr>
<tr>
<th>perimeter error</th>
<td>0.011303</td>
<td>NaN</td>
</tr>
<tr>
<th>area error</th>
<td>0.022407</td>
<td>NaN</td>
</tr>
<tr>
<th>smoothness error</th>
<td>0.004271</td>
<td>NaN</td>
</tr>
<tr>
<th>compactness error</th>
<td>0.005253</td>
<td>NaN</td>
</tr>
<tr>
<th>concavity error</th>
<td>0.009386</td>
<td>NaN</td>
</tr>
<tr>
<th>concave points error</th>
<td>0.003513</td>
<td>NaN</td>
</tr>
<tr>
<th>symmetry error</th>
<td>0.004018</td>
<td>NaN</td>
</tr>
<tr>
<th>fractal dimension error</th>
<td>0.005321</td>
<td>NaN</td>
</tr>
<tr>
<th>worst radius</th>
<td>0.077987</td>
<td>NaN</td>
</tr>
<tr>
<th>worst texture</th>
<td>0.021749</td>
<td>NaN</td>
</tr>
<tr>
<th>worst perimeter</th>
<td>0.067115</td>
<td>NaN</td>
</tr>
<tr>
<th>worst area</th>
<td>0.153892</td>
<td>NaN</td>
</tr>
<tr>
<th>worst smoothness</th>
<td>0.010644</td>
<td>NaN</td>
</tr>
<tr>
<th>worst compactness</th>
<td>0.020266</td>
<td>NaN</td>
</tr>
<tr>
<th>worst concavity</th>
<td>0.031802</td>
<td>NaN</td>
</tr>
<tr>
<th>worst concave points</th>
<td>0.144663</td>
<td>NaN</td>
</tr>
<tr>
<th>worst symmetry</th>
<td>0.010120</td>
<td>NaN</td>
</tr>
<tr>
<th>worst fractal dimension</th>
<td>0.005210</td>
<td>NaN</td>
</tr>
</tbody>
</table><h2>SHAP Summary Plots</h2><p>SHAP summary plots have been generated in the notebook.</p><h2>LIME Explanations</h2><p>LIME explanations have been generated in the notebook for a specific instance.</p></body></html>
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ google-generativeai
python-dotenv
scipy
pillow
xgboost
xgboost==1.5.1
colorama
dask