This code was used in the paper:
"CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities"
Sheng Xu, Peifeng Li and Qiaoming Zhu. EMNLP 2023.
A simple prompt-based model for predicting event pair coreferences. The model was trained and evaluated on the KBP corpus.
It first utilizes a prefix template
Set up a Python virtual environment and install packages using the requirements file:
conda create -n corefprompt python=3.9
conda activate corefprompt
python3 -m pip install -r requirements.txt
It is easy to use our model to predict event coreferences.
For example, consider the following text, which contains seven event mentions (ev1-ev7) and ten entity mentions (arg1-arg10) that serve as arguments.
Among them,
- the death event mention
$ev_1$ with the argument$arg_1$ and the death event mention$ev_5$ with the arguments$arg_5, arg_6,$ and$arg_7$ are coreferential, as both of them describe the girl's suicide by jumping off a building; - the injury event mention
$ev_3$ with the arguments$arg_2$ and$arg_3$ and the injury event$ev_4$ with the argument$arg_4$ are coreferential, as both of them describe the girl's disfigurement; - other event mentions are singletons.
Let's apply our CorefPrompt to predict the coreferences among these events. Note that For event arguments, we only care about participants and locations.
document = 'Former Pakistani dancing girl commits suicide 12 years after horrific acid attack which left her looking "not human". She had undergone 39 separate surgeries to repair damage. Leapt to her death from sixth floor Rome building earlier this month. Her ex-husband was charged with attempted murder in 2002 but has since been acquitted.'
ev1 = {
'offset': 38,
'trigger': 'suicide',
'args': [
{'mention': 'Former Pakistani dancing girl', 'role': 'participant'}
]
}
ev3 = {
'offset': 88,
'trigger': 'left',
'args': [
{'mention': 'acid', 'role': 'participant'},
{'mention': 'her', 'role': 'participant'}
]
}
ev4 = {
'offset': 168,
'trigger': 'damage',
'args': [
{'mention': 'She', 'role': 'participant'}
]
}
ev5 = {
'offset': 189,
'trigger': 'death',
'args': [
{'mention': 'her', 'role': 'participant'},
{'mention': 'sixth floor Rome building', 'role': 'place'}
]
}
First, we need to create a CorefPrompt model by loading our provided checkpoint coref-prompt-large (can be downloaded from Google Drive):
from corefprompt import CorefPrompt
model_checkpoint = './coref-prompt-large'
coref_model = CorefPrompt(model_checkpoint)
We provide two functions to predict event coreferences:
- predict_coref(event1, event2): suitable for processing multiple event pairs located in the same document. It is necessary to first load the corresponding document using the
init_document(doc)
function. - predict_coref_in_doc(document, event1, event2): directly predict coreference between the event pair located in a document.
Here are some usage examples:
# direct predict event pairs
res = coref_model.predict_coref_in_doc(document, ev1, ev5)
print('[Prompt]:', res['prompt'])
print(f"ev1[{ev1['trigger']}] - ev5[{ev5['trigger']}]: {res['label']} ({res['probability']})")
[Prompt]: In the following text, the focus is on the events expressed by <e1_start> suicide <e1_end> and <e2_start> death <e2_end>, and it needs to judge whether they refer to the same or different events: Former Pakistani dancing girl commits <e1_start> suicide <e1_end> 12 years after horrific acid attack which left her looking "not human". Here <e1_start> suicide <e1_end> expresses a <mask> event with Former Pakistani dancing girl as participants. She had undergone 39 separate surgeries to repair damage. Leapt to her <e2_start> death <e2_end> from sixth floor Rome building earlier this month. Here <e2_start> death <e2_end> expresses a <mask> event with her as participants at sixth floor Rome building. Her ex-husband was charged with attempted murder in 2002 but has since been acquitted. In conclusion, the events expressed by <e1_start> suicide <e1_end> and <e2_start> death <e2_end> have <mask> event type and <mask> participants, so they refer to <mask> event.
ev1[suicide] - ev5[death]: coref (0.9997438788414001)
# predict event pairs in the same document
coref_model.init_document(document)
res = coref_model.predict_coref(ev1, ev5)
print(f"ev1[{ev1['trigger']}] - ev5[{ev5['trigger']}]: {res['label']} ({res['probability']})")
res = coref_model.predict_coref(ev1, ev3)
print(f"ev1[{ev1['trigger']}] - ev3[{ev5['trigger']}]: {res['label']} ({res['probability']})")
res = coref_model.predict_coref(ev1, ev4)
print(f"ev1[{ev1['trigger']}] - ev4[{ev5['trigger']}]: {res['label']} ({res['probability']})")
res = coref_model.predict_coref(ev3, ev4)
print(f"ev3[{ev1['trigger']}] - ev4[{ev5['trigger']}]: {res['label']} ({res['probability']})")
ev1[suicide] - ev5[death]: coref (0.9997438788414001)
ev1[suicide] - ev3[left]: non-coref (0.9977204203605652)
ev1[suicide] - ev4[damage]: non-coref (0.9989845156669617)
ev3[left] - ev4[damage]: coref (0.999984622001648)
You can modify the demo.py file to try it out!
Download the pre-trained model weights used in our experiment from Huggingface Model Hub:
bash download_pt_models.sh
Note: this script will save all downloaded weights in ./PT_MODELS/
.
Coreference results are obtained using official Reference Coreference Scorer. This scorer reports results in terms of AVG-F, which is the unweighted average of the F-scores of four commonly used coreference evaluation metrics, namely
Run (from inside the repo):
cd ./
git clone git@github.com:conll/reference-coreference-scorers.git
This repo assumes access to the English corpora used in TAC KBP Event Nugget Detection and Coreference task (i.e., KBP 2015, KBP 2016, and KBP 2017). In total, they contain 648 + 169 + 167 = 984 documents, which are either newswire articles or discussion forum threads.
'2015': [
'LDC_TAC_KBP/LDC2015E29/data/',
'LDC_TAC_KBP/LDC2015E68/data/',
'LDC_TAC_KBP/LDC2017E02/data/2015/training/',
'LDC_TAC_KBP/LDC2017E02/data/2015/eval/'
],
'2016': [
'LDC_TAC_KBP/LDC2017E02/data/2016/eval/eng/nw/',
'LDC_TAC_KBP/LDC2017E02/data/2016/eval/eng/df/'
],
'2017': [
'LDC_TAC_KBP/LDC2017E54/data/eng/nw/',
'LDC_TAC_KBP/LDC2017E54/data/eng/df/'
]
KBP 2015 | KBP 2016 | KBP 2017 | All | |
---|---|---|---|---|
#Documents | 648 | 169 | 167 | 984 |
#Event mentions | 18739 | 4155 | 4375 | 27269 |
#Event Clusters | 11603 | 3191 | 2963 | 17757 |
Following Lu & Ng, (2021), we select LDC2015E29, E68, E73, E94 and LDC2016E64 as train set (817 docs, 735 for training and the remaining 82 for parameter tuning), and report results on the KBP 2017 dataset.
Dataset Statistics:
Train | Dev | Test | All | |
---|---|---|---|---|
#Documents | 735 | 82 | 167 | 984 |
#Event mentions | 20512 | 2382 | 4375 | 27269 |
#Event Clusters | 13292 | 1502 | 2963 | 17757 |
Then,
-
Download the
kbp_sent.txt
from the Github repository of previous work (Xu et al., 2022), which contains sentences splited using Stanford CoreNLP, and place it in the./data
directory. -
Convert the original dataset into jsonlines format using:
cd data/ export DATA_DIR=<ldc_tac_kbp_data_dir> export SENT_DIR=./ python3 convert.py --kbp_data_dir $DATA_DIR --sent_data_dir $SENT_DIR
Note: this script will create
train.json
、dev.json
andtest.json
in the data folder, as well astrain_filtered.json
、dev_filtered.json
andtest_filtered.json
which filter same or overlapping event mentions. -
Use the trigger detector provided by Xu et al., (2022) to extract event triggers in the test set, and store the results in the
./data/epoch_3_test_pred_events.json
file. -
Install the OmniEvent tool via
pip install OmniEvent
, download model weights, and then recognize event arguments using:cd data/KnowledgeExtraction/ python3 omni_argument_extraction.py
Note: this script will create
xxx_pred_args.json
files in the data/KnowledgeExtraction/argument_files folder.
To reduce the computational cost, we apply undersampling on the training set based on the event similarities.
Train an event similarity scorer, which can output similar event embeddings for coreferential event mentions and then calculate cosine values as event similarities (Run with --do_train
):
cd src/sample_selector/
export OUTPUT_DIR=./results/
python3 run_selector.py \
--output_dir=$OUTPUT_DIR \
--model_type=longformer \
--model_checkpoint=../../PT_MODELS/allenai/longformer-large-4096/ \
--train_file=../../data/train_filtered.json \
--dev_file=../../data/dev_filtered.json \
--test_file=../../data/test_filtered.json \
--pred_test_file=../../data/epoch_3_test_pred_events.json \
--max_seq_length=4096 \
--learning_rate=1e-5 \
--num_train_epochs=30 \
--batch_size=1 \
--do_train \
--warmup_proportion=0. \
--seed=42
After training, the model weights and the evaluation results on Dev set would be saved in $OUTPUT_DIR
. Then use --do_predict
parameter to predict event similarities. The predicted results, i.e., XXX_with_cos.json
, would be saved in $OUTPUT_DIR
.
Finally, create event info files based on recognized arguments and event similarities:
cd data/KnowledgeExtraction/
python3 related_info_extraction.py
This will create xxx_related_info_{cosine_threshold}.json
files in the data/KnowledgeExtraction/simi_files folder, which contain the arguments of each event and the related event information with high similarity.
Train our prompt-based model CorefPrompt using (Run with --do_train
):
cd src/coref_prompt/
export OUTPUT_DIR=./roberta_m_hta_hn_512_with_mask_product_cosine_results/
python3 run_mix_prompt.py \
--output_dir=$OUTPUT_DIR \
--prompt_type=m_hta_hn \
--with_mask \
--select_arg_strategy=no_filter \
--matching_style=product_cosine \
--cosine_space_dim=64 \
--cosine_slices=128 \
--cosine_factor=4 \
--model_type=roberta \
--model_checkpoint=../../PT_MODELS/roberta-large/ \
--train_file=../../data/train_filtered.json \
--train_file_with_cos=../../data/train_filtered_with_cos.json \
--dev_file=../../data/dev_filtered.json \
--test_file=../../data/test_filtered.json \
--train_simi_file=../../data/KnowledgeExtraction/simi_files/simi_omni_train_related_info_0.75.json \
--dev_simi_file=../../data/KnowledgeExtraction/simi_files/simi_omni_dev_related_info_0.75.json \
--test_simi_file=../../data/KnowledgeExtraction/simi_files/simi_omni_gold_test_related_info_0.75.json \
--pred_test_simi_file=../../data/KnowledgeExtraction/simi_files/simi_omni_epoch_3_test_related_info_0.75.json \
--sample_strategy=corefnm \
--neg_top_k=3 \
--max_seq_length=512 \
--learning_rate=1e-5 \
--num_train_epochs=10 \
--batch_size=4 \
--do_train \
--warmup_proportion=0. \
--seed=42
After training, the model weights and evaluation results on Dev set would be saved in $OUTPUT_DIR
. Then use --do_predict
parameter to predict coreferences for event mention pairs. The predicted results, i.e., XXX_test_pred_corefs.json
, would be saved in $OUTPUT_DIR
.
Create the final event clusters using predicted pairwise results:
cd src/clustering
export OUTPUT_DIR=./TEMP/
python3 run_cluster.py \
--output_dir=$OUTPUT_DIR \
--test_golden_filepath=../../data/test.json \
--test_pred_filepath=event-event/xxx_test_pred_corefs.json \
--golden_conll_filename=gold_test.conll \
--pred_conll_filename=pred_test.conll \
--do_evaluate
You can download the final event similarity scorer and our best weights at Google Drive.
Model | MUC | B3 | CEA | BLA | AVG |
---|---|---|---|---|---|
BERT | 35.8 | 54.4 | 55.6 | 36.0 | 45.5 |
RoBERTa | 37.9 | 55.9 | 57.3 | 38.3 | 47.3 |
(Lu & Ng, 2021) | 45.2 | 54.7 | 53.8 | 38.2 | 48.0 |
(Xu et al., 2022) | 46.2 | 57.4 | 59.0 | 42.0 | 51.2 |
CorefPrompt | 45.3 | 57.5 | 59.9 | 42.3 | 51.3 |
Contact Sheng Xu at sxu@stu.suda.edu.cn for questions about this repository.
@misc{xu2023corefprompt,
title={CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities},
author={Sheng Xu and Peifeng Li and Qiaoming Zhu},
year={2023},
eprint={2310.14512},
archivePrefix={arXiv},
primaryClass={cs.CL}
}