Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load a pretrained model? #44

Open
nirmal2k opened this issue Jan 11, 2022 · 8 comments
Open

How to load a pretrained model? #44

nirmal2k opened this issue Jan 11, 2022 · 8 comments

Comments

@nirmal2k
Copy link

No description provided.

@seanmacavaney
Copy link
Contributor

Hi- can you clarify whether you're interested in using a different model initialisation (e.g., changing bert-base-uncased to something else) or using a model that's already been fully tuned for ranking?

@nirmal2k
Copy link
Author

nirmal2k commented Jan 11, 2022

I want to load antique-vbert-pair.p, the already fine tuned one

@nirmal2k
Copy link
Author

I want to validate a pretrained model (antique-vbert-pair.p). How do I do that?

@seanmacavaney
Copy link
Contributor

Hi @nirmal2k -- sorry for the delay.

If you're looking to reproduce the results in Training Curricula for Open Domain Answer Re-Ranking, I recommend you train from scratch. Instructions are here. While it's possible to load the weight files into the cli-based OpenNIR pipelines, it's a bit hacky and tricky to get to work.

If instead, you're looking to conduct further experiments with the models, inspect outputs, etc. by far the easiest way to do it would be using the OpenNIR-PyTerrier integration. You can load the model like so:

import pyterrier as pt
if not pt.started():
  pt.init()
import onir_pt # OpenNIR-PyTerrier integration -- part of OpenNIR
reranker_pair = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair.p', ranker_config={'outputs': 2}, vocab_config={'train': True})

Then you can use the model in a variety of ways. E.g., if you wanted to conduct a similar experiment on ANTIQUE to the one in the paper, you could do:

import pyterrier as pt
if not pt.started():
  pt.init()
import onir_pt
from pyterrier.measures import *

# Dataset and indexing
dataset = pt.get_dataset('irds:antique/test')
indexer = pt.IterDictIndexer('./antique.terrier')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])

# Models
bm25 = pt.BatchRetrieve(index_ref, wmodel='BM25') % 100 # BM25 with cutoff of 100
reranker_pair = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair.p', ranker_config={'outputs': 2}, vocab_config={'train': True})
reranker_pair_recip = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair_recip.p', ranker_config={'outputs': 2}, vocab_config={'train': True})

# Experiment
pt.Experiment(
  [
    bm25,
    bm25 >> pt.text.get_text(dataset, 'text') >> reranker_pair,
    bm25 >> pt.text.get_text(dataset, 'text') >> reranker_pair_recip,
  ],
  dataset.get_topics(),
  dataset.get_qrels(),
  [MRR(rel=3), P(rel=3)@1]
)

Which gives the following results:

                  name  RR(rel=3)  P(rel=3)@1
0                 bm25   0.506052       0.345
1        reranker_pair   0.733746       0.630
2  reranker_pair_recip   0.761444       0.670

(Curiously, a bit better than what was reported in the paper. Probably due to using a different system for the first stage retrieval.)

Hope this helps!

@nirmal2k
Copy link
Author

Thanks for that @seanmacavaney !! I was able to reproduce those results.
Reranking 1000 documents gives an MRR@10-68.05. Is there a reason for the drop?
Also I'd appreciate if you could provide a code snippet on how to do a forward pass with the loaded pretrained model given a query and document text

@seanmacavaney
Copy link
Contributor

Reranking 1000 documents gives an MRR@10-68.05. Is there a reason for the drop?

I don't know definitively, but I suspect:

  • There could be a bias because the training documents were sampled from BM25's top 100, and going out to 1000 then goes out of domain.
  • It could be pulling up relevant documents that do not have relevance assessments. It can sometimes be helpful to report judgment rates (e.g., via including a measure like Judged@10) to sus out such cases.
  • There's also some work suggesting that tuning the threshold can be super helpful, so it may be possible that 100 isn't ideal either, and that the results could be further improved just by finding a better threshold.

I'd be curious to hear what you find if you get to the bottom of this!

Also I'd appreciate if you could provide a code snippet on how to do a forward pass with the loaded pretrained model given a query and document text

Here ya go!

import pandas as pd
sample_df = pd.DataFrame([
  {'qid': '0', 'query': 'some query text', 'docno': '0', 'text': 'some document text'},
  {'qid': '1', 'query': 'some other query text', 'docno': '1', 'text': 'some other document text'},
])
reranker_pair(sample_df)

should give:

  qid                  query docno                      text     score
0   0        some query text     0        some document text  8.423386
1   1  some other query text     1  some other document text  7.000756

@nirmal2k
Copy link
Author

Thanks for the code snippet @seanmacavaney !!
And for the reasons as to why there is a drop in MRR, first two reasons you mentioned were on top of my head. Third point seems like a hack for given dataset. I've worked with msmarco and there isn't drop in MRR while trying to rank more documents. The results here in SBERT rerank the entire corpus of 8.8M passages to get that MRR.
Maybe it's just that some relevant documents dont have assessments as you mentioned.

Thanks for the insights!!

@clin366
Copy link

clin366 commented Apr 4, 2024

Hi- can you clarify whether you're interested in using a different model initialisation (e.g., changing bert-base-uncased to something else) or using a model that's already been fully tuned for ranking?

Yes, I want to change bert-base-uncased to my fine-tuned version of BERT, but I don't know how to achieve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants