Mackenzie Jorgensen, Hannah Richert, Elizabeth Black, Natalia Criado, & Jose Such
When bias mitigation methods are applied to make fairer machine learning models in fairness-related classification settings, there is an assumption that the disadvantaged group should be better off than if no mitigation method was applied. However, this is a potentially dangerous assumption because a “fair” model outcome does not automatically imply a positive impact for a disadvantaged individual—they could still be negatively impacted. Modeling and accounting for those impacts is key to ensure that mitigated models are not unintentionally harming individuals; we investigate if mitigated models can still negatively impact disadvantaged individuals and what conditions affect those impacts in a loan repayment example. Our results show that most mitigated models negatively impact disadvantaged group members in comparison to the unmitigated models. The domain-dependent impacts of model outcomes should help drive future bias mitigation method development.
Paper Not So Fair: The Impact of Presumably Fair Machine Learning Models in the Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 2023
Datasets: Our simulated datasets are based on Hardt et al.'s 2016 dataset.
- Download the data folder from the Github repository for fairmlbook (Barocas, Hardt and Narayanan 2018)
- Save it to the root directory of this repository (csvs should be in the folder 'data/raw/')
- Then run:
Liu_paper_code/FICO-figures.ipynb
- Files:
- requirements.txt contains the required python packages for the project
- generate_data.py, classification.py -> run from cmd line
- Folder:
- Liu_paper_code: contains the forged code from https://github.com/lydiatliu/delayedimpact (indirectly used for data collection)
- configs: contains yaml files, which entail configurations for data collection and classification from cmd line
- scripts: contains all functions used for data collection, classification, evaluation and visualisations (stored in seperate py files)
- notebooks: contains the notebooks to run the code (data collection, classification, evaluation/statistics and visualizations)
This project can be divided into three stages:
- Generation/collection datasets
- Training and testing ML models
- Visualizing and performing statistical analyses on results
This section gives a high-level overview of the workflow of each section and what is needed to run the code. Stage 1 and 2 of the pipeline can be eiher run via notebook or via cmd line. The third stage is only executable via jupyter notebooks.
This section prepares the simulated, synthetic dataset (or the German Credit dataset) that will be used for training and testing the unmitigated and mitigated models.
Key details:
- The original dataset according to Hardt et al. (2016) has the group_size_ratio: [0.12;0.88] and black_label_ratio: [0.66;0.34]. By changing those parameters, we interfere with the demographic ratio and repayment labels for the disadvantaged group when creating synthetic datasets.
- The
scripts/data_creation_utils.py
is the pyfile that includes all of the helpful functions for the data collection for the baseline and synthetic datasets. - How to run:
- Way 1: Run the notebook (
/notebooks/simData_collection
) and set parameters in the third cell. - Way 2: Set parameters in
configs/data_creation
or create your own .yaml file in the folder and runpython generate_data.py -config data_creation
from any cmd line (you can substitute the -config parameter with your own yaml-file name).
- Way 1: Run the notebook (
This section describes training ML models on the baseline and synthetic data and training unmitigated and mitigated models on the data for comparison.
Key details:
- The
/scripts/classification_utils.py
and/scripts/evaluation_utils.py
are the pyfiles that include all the helpful functions for the classification. Note: the folder you want your results stored in is the input for theresults_dir
parameter in the config file and if the folder doesn't already exist, it will be created when you run the below. - How to run:
- Way 1: Run the notebook (
/notebooks/classification
) and set params in the second cell - Way 2: Set params in
configs/classification
or create your own .yaml file in the folder and runpython classification.py -config classification
from any cmd line (you can substitude the -config parameter with your own yaml-file name).
- Way 1: Run the notebook (
In this section, we investigate the impact results, check the score distributions for Nostepsrmality and then their significance based on different aspects of the experiments. Please note that to run the following two notebooks, you should have model results for all four classifiers; otherwise, you'll need to adjust the notebook code a bit.
- How to run:
- For stat testing: Run the notebook (
/notebooks/data_eval_&_statistics
) and add in parameters in the second cell. - For result visualizations: Run the notebook (
/notebooks/data_visualization
) and add in parameters in the second cell. Note: steps 3 runs if you have the varying impact distributions results from the paper and step 4 if you have all datasets and results run from the paper.
- For stat testing: Run the notebook (
- Fairness constraint options: DP refers to demographic parity, EO to equalized odds, TPRP to true positive rate parity, FPRP to false positive rate parity, ERP to error rate parity, and BGL to bounded group loss.
- The ML models available (these sklearn models' fit functions take in sample weights which is necessary for Fairlearn): gaussian naive bayes, decision tree, logistic regression, and svm. Currently, all samples are weighted equally (weight_index=1).
- The sklearn confusion matrix looks like:
[[TN FP] [FN TP]]
- Mackenzie Jorgensen - mackenzie.jorgensen@kcl.ac.uk
- Hannah Richert - hrichert@ous.de
Mackenzie Jorgensen, Hannah Richert, Elizabeth Black, Natalia Criado, and Jose Such. 2023. Not So Fair: The Impact of Presumably Fair Machine Learning Models. In AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23), August 8–10, 2023, Montréal, QC, Canada. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3600211.3604699
We owe a great deal to Liu et al.'s work, Delayed Impact of Fair Machine Learning. We extended their code here and added to it to study a classification problem with multiple ML models, fairness metrics, and mitigation methods.
Lydia's repository is licensed under the BSD 3-Clause "New" or "Revised" License.