Skip to content

Latest commit

 

History

History
38 lines (18 loc) · 834 Bytes

README.md

File metadata and controls

38 lines (18 loc) · 834 Bytes

Introduction

This is a repository for Idiom NER and Idiom Cloze

referring to ChID: A Large-scale Chinese IDiom Dataset for Cloze Test.

Data Process

  1. Download ChID dataset into data/chid folder from here

    including train_data.txt, dev_data.txt, test_data.txt files

  2. Download bert-base-chinese model into data/bert folder from here

    including config.json, vocab.txt, pytorch_model.bin files

Main Process

  • Task One: Idiom NER
python main1.py --name NER
  • Task Two: Idiom Cloze
python main2.py --name Cloze

You can modify the configuration through command line parameters or parser.py