Skip to content

Latest commit

 

History

History
73 lines (55 loc) · 4.65 KB

README.md

File metadata and controls

73 lines (55 loc) · 4.65 KB

PORL-HG (The paper has been published in AAAI 2020 as a long paper.)

Code implementation of Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation.

“A good basic selling idea, involvement and relevancy, of course, are as important as ever,
but in the advertising din of today, unless you make yourself noticed and believed, you ain’t got nothing”

— Leo Burnett (1891-1971)

Generation Examples Example

Dataset

CNNDM-DH, DM-DHC Datasets download link: PORLHG

You can follow the instructions to download and preprocess the CNN/DailyMail dataset to acquire the article.

The dataset is collected according to the url links provided by Nallapati et al. 2016, Hermann et al. 2015

The DH, DHC datasets can be associated with CNNDM by the id.

The dataset information:

train val test
DH 281208 12727 10577
DHC 138787 11862 10130

More Experiment Results

Table1. Correlation Analysis of CTR, comments and shares List of hypotheses and the corresponding p-value of the significance test, where bold text indicates significant hypothesis (p-value < 0.05). Note the p-value of CTR is referenced from Kuiken et al. 2017

Hypothesis CTR Comment Share
H1 Longer headline(> 50 characters) are preferred over shorter headlines 0.297 0 0
H2 Headlines with short words (< 8 characters per word) are preferred 0.024 0 0
H3 Headlines containing a question are preferred 0.019 0 0
H4 Headlines containing a partial quote are preferred over not containing any quote 0.239 0.996 0.971
H5 Headlines not containing any quote are preferred over containing full quote 0.03 0.848 0.111
H6 Headlines that contain one or more signal words are preferred 0.002 0 0.001
H7 Headlines that contain one or more personal or possessive pronouns are preferred 0 0 0
H8 Headlines that contain one or more sentimental words are preferred 0.018 0 0
H9 Headlines that contain one or more negative sentimental word are preferred 0.001 0.001 0.015
H10 Headlines that contain a number are preferred over headlines that do not 0.202 0 0.06
H11 Headlines that start with a personal or possessive pronoun are preferred 0.002 0 0.429

Table2. The popularity features. The following 11 features are transformed from the hypotheses stated in Table1. GT indicates the abbreviation of ground-truth headlines, and Chen et al. is one of our baselines Chen et al. 2018.

Hypothesis Significance GT PORL Chen et al.
H1 The average character length of a headline False 70.55 96.21 73.92
H2 The average of token lengths in a headline (lower is better) True 4.97 4.78 4.89
H3 The percentage of headlines containing a question mark True 2.52% 0.90% 1.19%
H4 The percentage of headlines containing a partial quote True 11.81% 15.80% 13.85%
H5 The percentage of headline containing full quote (lower is better) False 0.01% 0.00% 0.00%
H6 The percentage of headline containing signal words True 9.90% 19.83% 15.00%
H7 The percentage of headline containing personal or possessive pronoun True 28.82% 48.67% 40.35%
H8 The percentage of headline containing sentimental words True 68.82% 77.40% 69.37%
H9 The percentage of headline containing negative words True 45.09% 52.29% 44.83%
H10 The percentage of headline containing numbers False 20.58% 25.22% 21.06%
H11 The percentage of headline starting with personal or possessive pronoun True 0.64% 1.07% 0.38%

Cite

@article{Song_Shuai_Yeh_Wu_Ku_Peng_2020, title={Attractive or Faithful? Popularity-Reinforced Learning for Inspired Headline Generation}, volume={34}, url={https://ojs.aaai.org/index.php/AAAI/article/view/6421}, DOI={10.1609/aaai.v34i05.6421}, author={Song, Yun-Zhu and Shuai, Hong-Han and Yeh, Sung-Lin and Wu, Yi-Lun and Ku, Lun-Wei and Peng, Wen-Chih}, year={2020}, month={Apr.}, pages={8910-8917} }