Stable Video Diffusion Fine-Tuning Code for Moon Phases Video w/ LoRA
As part of the AITAP lecture at Hanyang University, we fine tune SVD with reference to the github: SVD_Xtend
The primary objective of this project was to fine-tune the Stable Video Diffusion model. Initially, the model was unable to display the phases of the moon even when given images of the moon as input. However, after fine-tuning, the model successfully generates outputs that visually represent the changing phases of the moon over time when provided with moon imagery as input.
The fine-tuning process was conducted using Google Colab. You can access the notebook via the following link:
Colab Notebook
GPU Used: NVIDIA A100
The fine-tuning dataset was prepared using the following videos: https://svs.gsfc.nasa.gov/5415/
Specially, we need the moon phase video like this: https://svs.gsfc.nasa.gov/5415/#media_group_376356
- To prepare the dataset, create an
original
folder containing one or more.mp4
video files in the same directory asvideoframe.py
. (It's okay if there's only one video)
videoframe.py
original
├── video_name1.mp4
├── video_name2.mp4
├── ...
- When you run
videoframe.py
, frames from each video will be extracted and stored in thefinetuning
folder. (By default, frames are extracted at an interval of 10, which can be adjusted using theFRAME_INTERVAL
variable invideoframe.py
.)
videoframe.py
original
├── video_name1.mp4
├── video_name2.mp4
├── ...
finetuning
├── video_name1
│ ├── video_frame1
│ ├── video_frame2
│ ...
├── video_name2
│ ├── video_frame1
│ ├── ...
- Create a
finetuning.zip
file as shown and save it to Google Drive.
finetuning.zip
├── finetuning
│ ├── video_name1
│ │ ├── video_frame1
│ │ ├── video_frame2
│ │ ├── ...
│ ├── video_name2
│ │ ├── video_frame1
│ │ ├── ...
- Share the zip file to generate a shareable link, and modify the link as shown below to include it in the
.ipynb
file for downloading in Colab.
# share link: https://drive.google.com/file/d/{file_id}/view?usp=sharing
!gdown https://drive.google.com/uc?id={file_id}
!unzip /content/finetuning.zip -d /content
Init Image | Before Fine-tuning | After Fine-tuning |
---|---|---|
1,000 Steps | 2,000 Steps | 3,000 Steps |
---|---|---|
Due to usage limitations in Colab, the training video size was set to width=512
, height=320
, and step=5000
. Increasing the size and step values could result in higher resolution and better performance.
Only two moon videos were used as the dataset. Expanding the dataset with more diverse videos could improve the model's generalization performance.
- Update README.md
- Upload fine-tuned model to the HuggingFace
This project is related to Diffusers, Stability AI and SVD_Xtend. Thanks for their great work.