ParagraphJointModel

Implementation of The AAAI-21 Workshop on Scientific Document Understanding paper A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification. There is a short video available for this work! This work is at top 2 of SciFact leaderboard as of March 30th, 2021.

Reproducing SciFact Leaderboard Result

Dependencies

We recommend you create an anaconda environment:

conda create --name scifact python=3.6 conda-build

Then, from the scifact project root, run

conda develop .

which will add the scifact code to your PYTHONPATH.

Then, install Python requirements:

pip install -r requirements.txt

If you encounter any installation problem regarding sent2vec, please check their repo.
The BioSentVec model is available here.

The SciFact claim files and corpus file are available at SciFact repo. The checkpoint of Paragraph-Joint model used for the paper (trained on training set) is available here. The checkpoint of Paragraph-Joint model used for leaderboard submission (trained on train+dev set) is available here.

Abstract Retrieval

python ComputeBioSentVecAbstractEmbedding.py --claim_file /path/to/claims.jsonl --corpus_file /path/to/corpus.jsonl --sentvec_path /path/to/sentvec_model

python SentVecAbstractRetriaval.py --claim_file /path/to/claims.jsonl --corpus_file /path/to/corpus.jsonl --k_retrieval 30 --claim_retrieved_file /output/path/of/retrieval_file.jsonl --scifact_abstract_retrieval_file /output/path/of/retrieval_file_scifact_format.jsonl

The retrieved abstracts are available here: train, dev, test.

Training of the ParagraphJoint Model (Optional for Result Reproduction Purpose)

FEVER Pre-training

You need to retrieve some negative samples for FEVER pre-training. We used the trieval code from here. Empirically, only retrieving 5 negative examples for each claim is enough, while retrieving more may be way too time-consuming. You need to convert the format of the output of the retrieval code to the input of SciFact.

For your convenience, the converted retrieved FEVER examples with k_retrieval=15 are available: train, dev.

The checkpoint of the Paragraph-Joint model only pretrained on the retrieved FEVER examples shared above is available here.

Run FEVER_joint_paragraph_dynamic.py to pre-train the model on FEVER. Use --checkpoint to specify the checkpoint path. Run scifact_joint_paragraph_dynamic.py to fine-tune on SciFact dataset. Use --pre_trained_model to load the pre-trained model. Please check the other options in the source file.

Joint Prediction of Rationale Selection and Stance Prediciton

python scifact_joint_paragraph_dynamic_prediction.py --corpus_file /path/to/corpus.jsonl --test_file /path/to/retrieval_file.jsonl --dataset /path/to/scifact/claims_test.jsonl --batch_size 25 --k 30 --prediction /path/to/output.jsonl --evaluate --checkpoint /path/to/checkpoint

File naming conventions

The file names should be self-explanatory. Most parameters are set with default values. The parameters should be straight forward.

Non-Joint Models

File names with rationale and stance are those scripts for rationale selection and stance prediction models.

FEVER Pretraining and Domain-Adaptation

File names with FEVER are scripts for training on FEVER dataset. Same for domain_adaptation.

Prediction

File names with prediction are scripts for taking the pre-trained models and perform inference.

KGAT

File names with kgat means those models with KGAT as stance predictor.

Fine-tuning

You can use --pre_trained_model path/to/pre_trained.model to load a model trained on FEVER dataset and fine-tune on SciFact.

Cite our paper

@inproceedings{li2021paragraph,
  title={A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification.},
  author={Li, Xiangci and Burns, Gully A and Peng, Nanyun},
  booktitle={SDU@ AAAI},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
lib		lib
.gitignore		.gitignore
ComputeBioSentVecAbstractEmbedding.py		ComputeBioSentVecAbstractEmbedding.py
FEVER_joint_paragraph_dynamic.py		FEVER_joint_paragraph_dynamic.py
FEVER_joint_paragraph_kgat.py		FEVER_joint_paragraph_kgat.py
FEVER_stance_paragraph.py		FEVER_stance_paragraph.py
FEVER_stance_paragraph_kgat.py		FEVER_stance_paragraph_kgat.py
README.md		README.md
SentVecAbstractRetriaval.py		SentVecAbstractRetriaval.py
TFIDFabstractRetrieval.py		TFIDFabstractRetrieval.py
dataset.py		dataset.py
domain_adaptation_joint_paragraph_dynamic.py		domain_adaptation_joint_paragraph_dynamic.py
domain_adaptation_joint_paragraph_fine_tune.py		domain_adaptation_joint_paragraph_fine_tune.py
domain_adaptation_joint_paragraph_kgat.py		domain_adaptation_joint_paragraph_kgat.py
domain_adaptation_joint_paragraph_kgat_prediction.py		domain_adaptation_joint_paragraph_kgat_prediction.py
domain_adaptation_joint_paragraph_predict.py		domain_adaptation_joint_paragraph_predict.py
paragraph_model_dynamic.py		paragraph_model_dynamic.py
paragraph_model_kgat.py		paragraph_model_kgat.py
requirements.txt		requirements.txt
scifact_joint_paragraph_dynamic.py		scifact_joint_paragraph_dynamic.py
scifact_joint_paragraph_dynamic_prediction.py		scifact_joint_paragraph_dynamic_prediction.py
scifact_joint_paragraph_kgat.py		scifact_joint_paragraph_kgat.py
scifact_joint_paragraph_kgat_prediction.py		scifact_joint_paragraph_kgat_prediction.py
scifact_rationale_paragraph.py		scifact_rationale_paragraph.py
scifact_stance_paragraph.py		scifact_stance_paragraph.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParagraphJointModel

Reproducing SciFact Leaderboard Result

Dependencies

Abstract Retrieval

Training of the ParagraphJoint Model (Optional for Result Reproduction Purpose)

FEVER Pre-training

Joint Prediction of Rationale Selection and Stance Prediciton

File naming conventions

Non-Joint Models

FEVER Pretraining and Domain-Adaptation

Prediction

KGAT

Fine-tuning

Cite our paper

About

Releases

Packages

Languages

jacklxc/ParagraphJointModel

Folders and files

Latest commit

History

Repository files navigation

ParagraphJointModel

Reproducing SciFact Leaderboard Result

Dependencies

Abstract Retrieval

Training of the ParagraphJoint Model (Optional for Result Reproduction Purpose)

FEVER Pre-training

Joint Prediction of Rationale Selection and Stance Prediciton

File naming conventions

Non-Joint Models

FEVER Pretraining and Domain-Adaptation

Prediction

KGAT

Fine-tuning

Cite our paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages