Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training args with COCO dataset #20

Open
hwan6615 opened this issue Dec 10, 2024 · 6 comments
Open

Training args with COCO dataset #20

hwan6615 opened this issue Dec 10, 2024 · 6 comments

Comments

@hwan6615
Copy link

I'm currently trying to reproduce the results using the COCO dataset, but I'm unable to achieve the performance reported in the paper.

Would it be possible to know the arguments used for training?

@DavidHuji
Copy link
Owner

Thank you for your interest in this repository! I have set the best hyperparameters as the default values in argparse. Which arguments are missing that you need assistance with? What performance did you achieve, and how does it compare to the results reported in the paper?

@hwan6615
Copy link
Author

Thank you for your good paper and response.

The experiments were conducted using the COCO dataset.

First, the arguments and performance before normalization are as follows.
화면 캡처 2024-12-15 054411
image

When normalization was not performed, the performance was very low. Performance is best at 1 epoch, and as training progresses, performance declines.

In contrast, when normalization was performed, the performance improved, but compared to the performance in the paper, it is considerably low, and BLEU-4 is particularly low.
image
image

Do you think there might be any issues, and is there any additional information you would like?

@DavidHuji
Copy link
Owner

DavidHuji commented Dec 14, 2024

Looks like the batch size is a bit bigger but i do not think this is the issue. Did you evaluate on the last epoch? Maybe try earlier ones where the loss is already low as much as in the last epoch (i.e. the 7th rather than the 10th).

@hwan6615
Copy link
Author

I conducted evaluations for all 10 epochs, and this represents the best performance among them. As mentioned above, without normalization, performance decreases as epochs progress. When normalization is applied, performance improves with each epoch, but falls significantly short of the performance reported in the original paper.

@DavidHuji
Copy link
Owner

Oh I see, I missed that earlier. I think then trying a smaller bs as in the default parameters might be helpful (or at least increase the lr instead or the amount of epochs) because it sound like the training was just not long enough (when the bs is x4 bigger, there are x4 less weights updates).

@hwan6615
Copy link
Author

Thank you for your response. I will proceed with an experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants