Skip to content

play0137/Generate_coherent_text

Repository files navigation

Overview

The objective of this research is to generate coherent and understandable text in Chinese.
We extract commonsense knowledge from ConceptNet automatically and select concepts by Monte-Carlo Tree Search (MCTS) algorithm.
Combine text by templates and use a constructed word embedding model and a Deep Neural Network (DNN) of discourse coherence model as a reward function in MCTS to evaluate the coherent score of generated text.

Evaluate generated text by human rating, and the result shows that it is more coherent when using the discourse coherence model.

Chinese ConceptNet

Please refer to Chinese ConceptNet.

Models

  • Word embedding model
  • Discourse coherence model (download link)
    A DNN model which can evaluate the coherent score of generated text.
    The positive samples are the original paragraphs, and the negative samples are paragraphs with replacement of other connected concepts which has the same POS in ConceptNet.
    Words in a sentence or a paragraph are closely related. Concepts replacement makes paragraph less coherent even a single replacement.
    The training processes and experiments are in the reports.

Usage

  • Download models above to the .\model folder
  • Create a conda environment and install dependent packages with command
    conda create --name <env_name> --file requirements.txt
  • Run MCTS.py (It takes minutes to run the program.)
  • Generated texts are in the .\output folder

Reference

Ying-Ren Chen (2021). Generate coherent text using semantic embedding, common sense templates and Monte-Carlo tree search methods (Master's thesis, National Tsing Hua University, Hsinchu, Taiwan).

BibTeX:

@mastersthesis{Chen:2021:generate_coherent_text,
     author = "Ying-Ren Chen,
     title = "Generate coherent text using semantic embedding, common sense templates and Monte-Carlo tree search methods",
     school = "National Tsing Hua University",
     pages = 136,
     year = 2021
}

License

This work is licensed under a GNU General Public License.

Releases

No releases published

Packages

No packages published

Languages