The objective of this research is to generate coherent and understandable text in Chinese.
We extract commonsense knowledge from ConceptNet automatically and select concepts by Monte-Carlo Tree Search (MCTS) algorithm.
Combine text by templates and use a constructed word embedding model and a Deep Neural Network (DNN) of discourse coherence model as a reward function in MCTS to evaluate the coherent score of generated text.
Evaluate generated text by human rating, and the result shows that it is more coherent when using the discourse coherence model.
Please refer to Chinese ConceptNet.
- Word embedding model
- Discourse coherence model (download link)
A DNN model which can evaluate the coherent score of generated text.
The positive samples are the original paragraphs, and the negative samples are paragraphs with replacement of other connected concepts which has the same POS in ConceptNet.
Words in a sentence or a paragraph are closely related. Concepts replacement makes paragraph less coherent even a single replacement.
The training processes and experiments are in the reports.
- Download models above to the .\model folder
- Create a conda environment and install dependent packages with command
conda create --name <env_name> --file requirements.txt
- Run MCTS.py (It takes minutes to run the program.)
- Generated texts are in the .\output folder
Ying-Ren Chen (2021). Generate coherent text using semantic embedding, common sense templates and Monte-Carlo tree search methods (Master's thesis, National Tsing Hua University, Hsinchu, Taiwan).
BibTeX:
@mastersthesis{Chen:2021:generate_coherent_text,
author = "Ying-Ren Chen,
title = "Generate coherent text using semantic embedding, common sense templates and Monte-Carlo tree search methods",
school = "National Tsing Hua University",
pages = 136,
year = 2021
}
This work is licensed under a GNU General Public License.