-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROUGE-N added #69
base: master
Are you sure you want to change the base?
ROUGE-N added #69
Conversation
@@ -118,4 +118,4 @@ def test_compute_metrics(self): | |||
self.assertAlmostEqual(0.88469, scores['EmbeddingAverageCosineSimilairty'], places=5) | |||
self.assertAlmostEqual(0.568696, scores['VectorExtremaCosineSimilarity'], places=5) | |||
self.assertAlmostEqual(0.784205, scores['GreedyMatchingScore'], places=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Would you add some tests for the value of the ROUGE metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I will not :
- The code added is from another repository, not my code
- The score are slightly different with pyrouge.
- The goal of this PR is to give a quick and approximate way to get ROUGE-N score. This should not be merged to main branch, but kept open here.
- For a real ROUGE-N score, someone needs to add the official perl script ROUGE-155... I don't have time for this now :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other repo's code seems to be Apache-licensed, I'm not sure if we can merge it, particularly without including their license. I'm not too worried about slightly different values as long as we're clear about the methods used in the docs. Are you aware where differences might come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could at least test that the values are within some reasonable bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried comparing the results with the results using the rouge package. But scores are different (and not only a few digits...). Not only ROUGE-N are different, but also the existing ROUGE-L.
This package is also Apache-licensed. Not sure if we can just use it without including their license (without modifying their code).
To fix #68, ROUGE-N metric is added to Rouge.
Taken from : https://github.com/pltrdy/seq2seq/blob/master/seq2seq/metrics/rouge.py
Results might be slightly different from official ROUGE-155 script, but at least code is very simple.