Training args with COCO dataset #20

hwan6615 · 2024-12-10T10:17:10Z

I'm currently trying to reproduce the results using the COCO dataset, but I'm unable to achieve the performance reported in the paper.

Would it be possible to know the arguments used for training?

DavidHuji · 2024-12-14T18:26:22Z

Thank you for your interest in this repository! I have set the best hyperparameters as the default values in argparse. Which arguments are missing that you need assistance with? What performance did you achieve, and how does it compare to the results reported in the paper?

hwan6615 · 2024-12-14T20:49:28Z

Thank you for your good paper and response.

The experiments were conducted using the COCO dataset.

First, the arguments and performance before normalization are as follows.

When normalization was not performed, the performance was very low. Performance is best at 1 epoch, and as training progresses, performance declines.

In contrast, when normalization was performed, the performance improved, but compared to the performance in the paper, it is considerably low, and BLEU-4 is particularly low.

Do you think there might be any issues, and is there any additional information you would like?

DavidHuji · 2024-12-14T21:02:06Z

Looks like the batch size is a bit bigger but i do not think this is the issue. Did you evaluate on the last epoch? Maybe try earlier ones where the loss is already low as much as in the last epoch (i.e. the 7th rather than the 10th).

hwan6615 · 2024-12-14T21:08:18Z

I conducted evaluations for all 10 epochs, and this represents the best performance among them. As mentioned above, without normalization, performance decreases as epochs progress. When normalization is applied, performance improves with each epoch, but falls significantly short of the performance reported in the original paper.

DavidHuji · 2024-12-14T21:20:56Z

Oh I see, I missed that earlier. I think then trying a smaller bs as in the default parameters might be helpful (or at least increase the lr instead or the amount of epochs) because it sound like the training was just not long enough (when the bs is x4 bigger, there are x4 less weights updates).

hwan6615 · 2024-12-16T08:46:19Z

Thank you for your response. I will proceed with an experiment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training args with COCO dataset #20

Training args with COCO dataset #20

hwan6615 commented Dec 10, 2024

DavidHuji commented Dec 14, 2024

hwan6615 commented Dec 14, 2024

DavidHuji commented Dec 14, 2024 •

edited

Loading

hwan6615 commented Dec 14, 2024

DavidHuji commented Dec 14, 2024

hwan6615 commented Dec 16, 2024

Training args with COCO dataset #20

Training args with COCO dataset #20

Comments

hwan6615 commented Dec 10, 2024

DavidHuji commented Dec 14, 2024

hwan6615 commented Dec 14, 2024

DavidHuji commented Dec 14, 2024 • edited Loading

hwan6615 commented Dec 14, 2024

DavidHuji commented Dec 14, 2024

hwan6615 commented Dec 16, 2024

DavidHuji commented Dec 14, 2024 •

edited

Loading