You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I plugged in a new dataset, after 1800 epochs, I see that semantically, the generation appears to follow the text conditioning, but the poses are too jittery (please see attachment). Could you maybe point out what's wrong?
Hi, thanks for your interest to our work, could you elaborate how large your dataset is, waht your text encoder is, and how large the network is? what's your sampler and sampling steps?
Thank you for your response. I have a dataset that contains 600 sequences with a total of 34000 frames. My step size is 1 and if the sequence length is larger than the maximum number of frames, I randomly select a start index. I am using the default text encoder in the framework, i.e. CLIP. Do you think the have too little data? I don't observe this problem when I train using the diffusion model.
I plugged in a new dataset, after 1800 epochs, I see that semantically, the generation appears to follow the text conditioning, but the poses are too jittery (please see attachment). Could you maybe point out what's wrong?
text: straightening up
https://github.com/dongzhuoyao/motionfm/assets/169649811/b4fa0242-97d5-4080-827f-3578c3cd0d84
thanks!
The text was updated successfully, but these errors were encountered: