Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance decrease during training #12

Open
ramdhan1989 opened this issue Mar 27, 2023 · 1 comment
Open

performance decrease during training #12

ramdhan1989 opened this issue Mar 27, 2023 · 1 comment

Comments

@ramdhan1989
Copy link

Hi, do you have suggestion to overcome this problem during training ?

Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     0/199       11G    0.1279   0.01601         0  0.008378     2.849         6       512: 100%|█| 1800/1800 [14:48<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:07<00
30.39782691001892
                 all    2.75e+03    4.51e+03           0           0    5.13e-06    9.81e-07

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     1/199       11G    0.1261   0.01524         0  0.005636     2.846         6       512: 100%|█| 1800/1800 [14:03<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:01<00
31.465840816497803
                 all    2.75e+03    4.51e+03           0           0    3.55e-06    6.64e-07

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     2/199       11G    0.1214   0.01546         0  0.005382     2.844        14       512: 100%|█| 1800/1800 [13:46<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:03<00
32.228920221328735
                 all    2.75e+03    4.51e+03       0.321       0.297       0.194      0.0497

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     3/199       11G    0.1142   0.01436         0  0.005227     2.839        20       512: 100%|█| 1800/1800 [13:39<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:58<00
28.6451997756958
                 all    2.75e+03    4.51e+03       0.316       0.485       0.345      0.0999

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     4/199       11G   0.09978   0.01415         0  0.005147     2.832         7       512: 100%|█| 1800/1800 [13:23<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:57<00
28.444270849227905
                 all    2.75e+03    4.51e+03       0.408       0.578       0.472       0.167

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     5/199       11G   0.09265   0.01457         0  0.005125     2.829         5       512: 100%|█| 1800/1800 [13:32<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:02<00
30.84639859199524
                 all    2.75e+03    4.51e+03       0.399       0.623       0.507       0.161

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     6/199       11G   0.08306   0.01727         0  0.005281     2.825        10       512: 100%|█| 1800/1800 [13:44<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [01:01<00
30.013824462890625
                 all    2.75e+03    4.51e+03       0.285       0.589       0.453       0.145

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     7/199       11G       nan       nan         0  0.005711       nan         6       512: 100%|█| 1800/1800 [13:36<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:51<00
31.282738208770752
                 all    2.75e+03    4.51e+03           0           0    1.57e-06    1.74e-07

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     8/199       11G       nan       nan         0       nan       nan        10       512: 100%|█| 1800/1800 [13:31<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:49<00
32.83151125907898
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
     9/199       11G       nan       nan         0       nan       nan         9       512: 100%|█| 1800/1800 [13:20<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.580291509628296
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    10/199       11G       nan       nan         0       nan       nan         4       512: 100%|█| 1800/1800 [13:25<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:48<00
32.03327965736389
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    11/199       11G       nan       nan         0       nan       nan         9       512: 100%|█| 1800/1800 [13:28<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:47<00
30.341226816177368
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    12/199       11G       nan       nan         0       nan       nan         2       512: 100%|█| 1800/1800 [13:11<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.359901189804077
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    13/199       11G       nan       nan         0       nan       nan        13       512: 100%|█| 1800/1800 [13:05<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.436581134796143
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    14/199       11G       nan       nan         0       nan       nan         7       512: 100%|█| 1800/1800 [13:04<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.631073713302612
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    15/199       11G       nan       nan         0       nan       nan         6       512: 100%|█| 1800/1800 [13:08<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:45<00
29.1485652923584
                 all    2.75e+03           0           0           0           0           0

     Epoch   gpu_mem       box       obj       cls       dgi     total   targets  img_size
    16/199       11G       nan       nan         0       nan       nan        18       512: 100%|█| 1800/1800 [13:14<00
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|█| 344/344 [00:46<00
29.673731088638306
                 all    2.75e+03           0           0           0           0           0
@WindVChen
Copy link
Owner

There seems a gradient explosion (or something else) that lead to a NAN loss value. What about turning down the learning rate, or clip the gradient before optimizer.step() ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants