Reweighting synthetic examples

This repository is the code for our paper, "Reweighting Strategy based on Synthetic Data Identification for Sentence Similarity" (COLING2022) [paper].

How to Begin

Install required packages in requirements.txt.
Download preprocessed benchmark datasets (STSb, QQP, and MRPC) from this drive link.
Prepare PAWS-QQP dataset following this repository, and locate it in datasets/benchmarks/paws/.

How to Reproduce

1. Data preparation

Run scripts/0_preprocessing.sh script. This will prepare sentences (C_src) to make synthetic dataset, and split PAWS dataset into dev and test splits.

2. Synthetic dataset generation & Machine-written example identification

Run scripts/1_generation.sh script to generate synthetic examples and train a discriminator model that identifies them.
A process to create synthetic dataset is same with the original DINO framework suggested by Schick et al. (2021).

3. Training and evaluating STS models

Run scripts/2_run_sts.sh to train bi-encoder models for sentence similarity tasks.
The shell script is to reproduce all results in Table 2 (reweighting or not, ablation study).

4. Other baseline models

Run scripts/3_run_other_baselines.sh to reprduce the results of other baseilne models in Table 6, such as GloVe, BERT, and USE.

Acknowledge

Codes to generate synthetic dataset are derieved from Schick et al. (2021)'s work. (Github)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
scripts		scripts
sentence_transformers		sentence_transformers
task_specs		task_specs
.gitignore		.gitignore
README.md		README.md
dino.py		dino.py
discrimination.py		discrimination.py
modeling.py		modeling.py
postprocess_dataset.py		postprocess_dataset.py
preprocess_data.py		preprocess_data.py
requirements.txt		requirements.txt
run_training.py		run_training.py
run_unsupervised_textual_similarity.py		run_unsupervised_textual_similarity.py
run_use.py		run_use.py
split_paws_dev_test.py		split_paws_dev_test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reweighting synthetic examples

How to Begin

How to Reproduce

Acknowledge

About

Releases

Packages

Languages

ddehun/coling2022_reweighting_sts

Folders and files

Latest commit

History

Repository files navigation

Reweighting synthetic examples

How to Begin

How to Reproduce

Acknowledge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages