cluster_based_relevance_feedback

In June 2024, a friend and I developed a superior alternative to BM-25 using a clustering approach with a k-NN algorithm. The effectiveness of this method in improving MAP was confirmed through both a paired t-test and the Wilcoxon log-rank test.

This project was part of the Information Retrieval exam. Our approach was inspired by the paper by Lee, K. S., Croft, W. B., & Allan, J. (2008) titled A deterministic resampling method using overlapping document clusters for pseudo-relevance feedback. We can summarize it in seven steps:

Initial retrieval on the whole collection.
Clustering.
Identification of the "dominant document" based on the clustering.
Aggregation of documents in the clusters of the "dominant document."
Retrieval on the aggregated clusters.
Query expansion based on the first result (pseudo-RF).
Second retrieval on the whole collection.

We then applied this approach on an experimental collection to optimize its hyperparameters and test its effectiveness.

We have uploaded the following files:

bulk.py and bulk_without_stopwords.py: Used to index the TREC collection ROBUST 2004, both with and without stopwords.
cluster_pseudo_RF.py: Contains the code for the pseudo RF described above.
search.py: Used to get the results of the queries on ROBUST 2004 and to perform a grid search to optimize some parameters.
Quartuccio_Varotto.pdf: Contains the presentation my friend and I used to present our project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cluster_based_relevance_feedback

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Quartuccio_Varotto.pdf		Quartuccio_Varotto.pdf
README.md		README.md
bulk.py		bulk.py
bulk_without_stopwords.py		bulk_without_stopwords.py
cluster_pseudo_RF.py		cluster_pseudo_RF.py
search.py		search.py

lucavarotto/cluster_based_relevance_feedback

Folders and files

Latest commit

History

Repository files navigation

cluster_based_relevance_feedback

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages