Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in spm tokenizers #264

Open
nkrasner opened this issue May 31, 2024 · 0 comments
Open

Memory leak in spm tokenizers #264

nkrasner opened this issue May 31, 2024 · 0 comments

Comments

@nkrasner
Copy link

Using the flores101 or flores200 tokenizers is resulting in a memory leak.
I am using version 2.4.2 on Windows 11, but the same was also occurring on version 2.4.0.

Running the following results in memory usage increasing linearly until crashing:
import sacrebleu
while True:
sacrebleu.sentence_bleu("Hello world.", ["Hello world."], tokenize="flores101")
This is also the case for corpus_bleu.
I do not think that it is due to caching since I am running it over and over on the same sentence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant