Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training video size #661

Open
cdfan0627 opened this issue Jan 13, 2025 · 5 comments
Open

Training video size #661

cdfan0627 opened this issue Jan 13, 2025 · 5 comments
Assignees

Comments

@cdfan0627
Copy link

cdfan0627 commented Jan 13, 2025

請問training video 長寬一定要是480×720768×1360嗎,還是其實只要符合某個倍數就可以了

@OleehyO OleehyO self-assigned this Jan 13, 2025
@OleehyO
Copy link
Collaborator

OleehyO commented Jan 13, 2025

480x720是1.0模型的分辨率,必须固定。

768x1360是1.5模型的分辨率,只有i2v的分辨率可以自己定义(t2v也必须固定),但还是建议使用768x1360来微调,否则效果可能会不太好。

@cdfan0627
Copy link
Author

cdfan0627 commented Jan 13, 2025

想請問為什麼t2v都必須固定呢

@OleehyO
Copy link
Collaborator

OleehyO commented Jan 14, 2025

因为是固定分辨率训练的

@eightmusic
Copy link

因为是固定分辨率训练的
训练需要多少显存

@cdfan0627
Copy link
Author

請問如果我在CogVideoX 5B I2V 使用下面的code的話,是不是就可以train 跟 inference 720 * 480以外的resolution,以及是否resolution還需要是8的倍數,或還有什麼其他限制呢?

del transformer.patch_embed.pos_embedding
transformer.patch_embed.use_learned_positional_embeddings = False
transformer.config.use_learned_positional_embeddings = False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants