-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to load a pretrained model? #44
Comments
Hi- can you clarify whether you're interested in using a different model initialisation (e.g., changing |
I want to load antique-vbert-pair.p, the already fine tuned one |
I want to validate a pretrained model (antique-vbert-pair.p). How do I do that? |
Hi @nirmal2k -- sorry for the delay. If you're looking to reproduce the results in Training Curricula for Open Domain Answer Re-Ranking, I recommend you train from scratch. Instructions are here. While it's possible to load the weight files into the cli-based OpenNIR pipelines, it's a bit hacky and tricky to get to work. If instead, you're looking to conduct further experiments with the models, inspect outputs, etc. by far the easiest way to do it would be using the OpenNIR-PyTerrier integration. You can load the model like so: import pyterrier as pt
if not pt.started():
pt.init()
import onir_pt # OpenNIR-PyTerrier integration -- part of OpenNIR
reranker_pair = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair.p', ranker_config={'outputs': 2}, vocab_config={'train': True}) Then you can use the model in a variety of ways. E.g., if you wanted to conduct a similar experiment on ANTIQUE to the one in the paper, you could do: import pyterrier as pt
if not pt.started():
pt.init()
import onir_pt
from pyterrier.measures import *
# Dataset and indexing
dataset = pt.get_dataset('irds:antique/test')
indexer = pt.IterDictIndexer('./antique.terrier')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['text'])
# Models
bm25 = pt.BatchRetrieve(index_ref, wmodel='BM25') % 100 # BM25 with cutoff of 100
reranker_pair = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair.p', ranker_config={'outputs': 2}, vocab_config={'train': True})
reranker_pair_recip = onir_pt.reranker('vanilla_transformer', 'bert', weights='antique-vbert-pair_recip.p', ranker_config={'outputs': 2}, vocab_config={'train': True})
# Experiment
pt.Experiment(
[
bm25,
bm25 >> pt.text.get_text(dataset, 'text') >> reranker_pair,
bm25 >> pt.text.get_text(dataset, 'text') >> reranker_pair_recip,
],
dataset.get_topics(),
dataset.get_qrels(),
[MRR(rel=3), P(rel=3)@1]
) Which gives the following results:
(Curiously, a bit better than what was reported in the paper. Probably due to using a different system for the first stage retrieval.) Hope this helps! |
Thanks for that @seanmacavaney !! I was able to reproduce those results. |
I don't know definitively, but I suspect:
I'd be curious to hear what you find if you get to the bottom of this!
Here ya go! import pandas as pd
sample_df = pd.DataFrame([
{'qid': '0', 'query': 'some query text', 'docno': '0', 'text': 'some document text'},
{'qid': '1', 'query': 'some other query text', 'docno': '1', 'text': 'some other document text'},
])
reranker_pair(sample_df) should give:
|
Thanks for the code snippet @seanmacavaney !! Thanks for the insights!! |
Yes, I want to change |
No description provided.
The text was updated successfully, but these errors were encountered: