On the Orthogonality of Bias and Utility in Ad hoc Retrieval

This repository contains the code and resources for our bias-aware query expansion method, which diminishes the existing gender biases among retrived set of documents. The main focus of this approach in on controlling the bias in psuedo relevance feedback. Our work has shown that it is possible to effectively revise a user query that would lead to a less biased ranked list of documents. Based on our experiments, we find that a less biased revised query can maintain utility and at the same time reduce bias. We believe that this work lays the foundation for considering fairness and utility as two cooperating measures as opposed to being viewed as competing aspects.

In order to revise the initial query in a way that utility is maintained while significantly reducing bias, we re-rank the retrieved list of documents by BM25 using the interpolation formula:

Rel_debiased(d) = (1-λ) Rel(d) - λ Bias(d)

We evaluate our approach by measuring the geneder bias in the retrieved lists of queries for our bias-aware expansion method against simple BM25 and RM3 expansion methods. Associated run files for each of the methods can be found in results/runs directory.The run files for our bias-aware expansion method are available for different values of lambda that determines the level of being sensitive towards biasedness of documents. In addition, the original queries and bias-aware expanded queries for Robust04, GOV2, CW09, CW12 are available in the results/queries directory.

We selected the interpolation coefficient (\lambda) from [0, 1] with 0.1 incerement. In order to explore whether bias has actually been reduced systematically in the retrieved list of documents, we measure the degree of bias using two diffirent appraoches including ARaB methods and LIWC Toolkit. Our results for λ = 0.5 are provided in table 1. The complete set of results for all λ values are available in Table2 in results directory. The results show that regardless of the metric used to measure the bias of the retrieved ranked list of documents, bias is decreased significantly on all the three bias metrics. The percentage of decrease in bias is consistent across all metrics and always statistically significant.

Table 1: Bias measurements using ARaB and LIWC-based metrics.

Method	Robust04			GOV2			CW9			CW12
	ARaB		LIWC	ARaB		LIWC	ARaB		LIWC	ARaB		LIWC
	TF	Boolean	LIWC	TF	Boolean	LIWC	TF	Boolean	LIWC	TF	Boolean	LIWC
BM25 (Run)	0.61	0.35	0.48	0.33	0.14	0.07	0.23	0.08	0.05	0.40	0.14	0.19
PRF (Run)	0.61	0.34	0.45	0.39	0.11	0.07	0.22	0.07	0.07	0.42	0.10	0.2
Our Approach (Run)	0.43	0.27	0.34	0.18	0.07	0.05	0.14	0.06	0.04	0.23	0.05	0.13
Decrease in Bias (%)	29.5	20.6	24.4	53.8	36.4	28.6	36.4	14.3	42.9	45.2	50.0	35.0

Usage

In order to achieve a less biased reformulated set of queries, given the initial queries and corresponding relevant documents, one should replicate the following steps:

use documents_calculate_bias.py script to calculate the bias level of the documents of the given collection.
Use interpolation.py script to interpolate the retrieval score (given by BM25) with the bias score of each document to re-rank the documents. (In our experiments, we have selected lambda in range of [0,1] with 0.1 increment.)
Use anserini toolkit to perform pseudo relevance feedback and expand the queries based on the top 10 documents of each query. In order to have a less biased expansion, we added a function called customised_RM3 in the SimpleSearcher class of anserini to expand each query based on the given initial query and the re-ranked list of documents that are less biased in comparison with the original run. The changes are made in the SimpleSearcher and RM3ReRanker classes of anserini that is forked into this repository. Finally, the searcher returns a list of retrieved documents based on the expanded queries which can be found here.

In order to evaluate the bias-aware expanded queries and calculate the level of gender biases inside the retirieved documents of each run file:

You can use the following command inside the anserini directory to evaluate the performance of the bias-aware expanded queries:

tools/eval/trec_eval.9.0.4/trec_eval -m map -m P.30 results/queries/Original Queries/RB04/RB04_qrels.txt /results/runs/Bias-aware PRF/RB04/retrieved_list_unbiased_lambda_0.5.txt

You may use runs_calculate_bias.py and retrieved_list_calculate_bias.py scripts for calculating the TF ARab and TF Boolean metrics introduced in Do Neural Ranking Models Intensify Gender Bias? . In addition, the codes for one other metric namely, LIWC are included inside src/LIWC directory. The LIWC lexicon is proprietary, so it is not included in this repository. The lexicon data can be purchased from liwc.net.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

On the Orthogonality of Bias and Utility in Ad hoc Retrieval

Table 1: Bias measurements using ARaB and LIWC-based metrics.

Usage

In order to achieve a less biased reformulated set of queries, given the initial queries and corresponding relevant documents, one should replicate the following steps:

In order to evaluate the bias-aware expanded queries and calculate the level of gender biases inside the retirieved documents of each run file:

Files

README.md

Latest commit

History

README.md

File metadata and controls

On the Orthogonality of Bias and Utility in Ad hoc Retrieval

Table 1: Bias measurements using ARaB and LIWC-based metrics.

Usage

In order to achieve a less biased reformulated set of queries, given the initial queries and corresponding relevant documents, one should replicate the following steps:

In order to evaluate the bias-aware expanded queries and calculate the level of gender biases inside the retirieved documents of each run file: