-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about implementation details. #15
Comments
Hello, what do you need to prepare to run this project? I downloaded it for several days, but still couldn't successfully run it. How can I run this project? |
Hello, may I ask what version of PyTorch you are using? Have you encountered any issues when using batch_first=True? |
I just downloaded the code and data. It looks like 8 GPUs with 256 batch_size is essential for reproducing the project. @shams2023 |
The code is based on CLIP4Clip, Version of torch is 1.11.0 and cuda is 11.6 @Tiiivoo |
|
@sweet132 Have you ever noticed that how much graphics memory do you use when batch_size=128? When I turn down the batch_size_val, I still report the error of "CUDA out of memory when evaluating. Testing model at the end!". |
The modeling section isin modeling.py, which you can find what you want @shams2023 |
If you have 8GPUs for batch_size=256, the memory of GPU will be around 20GB. You can reference as the setting. I am not sure why it takes up so much memory since it just needs around 11GB for CLIP4Clip @shallowdream66 |
I am also very confused. Compared to CLIP4clip, it takes up a lot of memory and training time |
Hello, regarding the 'msrvtt_train_with_vitb32_max1_title_titles.json' file, I didn't understand where the 'titles' data comes from. It seems that MSR-VTT dataset doesn't have this part. If the 'titles' section is obtained through web crawling, why are there 30 of them? |
I'm glad to hear that you've successfully reproduced our results. Regarding the batch size issue, we apologize for any confusion, and it may indeed be an oversight in the paper. Please consider our code as the practical reference. |
Thank you for your reply, although I achieved similar results to the paper on msrvtt, I got poor results on msvd(46.1), where I trained directly on the raw data, while for vatex(62.0) dataset, I used the extracted frames you uploaded. I'm not sure why is that. |
Hello, I suggest you refer to the paper, the titles are generated by model (gpt-2 or clip) @Tiiivoo |
How to complete this task for a single card 3090? |
+1 |
Can you send this out for me to refer to? I don't know how to define the storage location of the variables here. As shown in the following figure: (It can also be co_train_msrvtt. sh) |
Hello, I admit that this is a good job.
However, in the code, you set batch_size=256, but the paper states that it is 128 ( Maybe the version of the paper I downloaded is wrong? I downloaded it from arxiv ).
I reproduced the code and found that when batch_size=256, the accuracy on msrvtt-1ka is equivalent to that in the paper, but when batch_size=128, it is only about 47%.
The text was updated successfully, but these errors were encountered: