Skip to content

This is the official repository for ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation [ECCV2024]

License

Notifications You must be signed in to change notification settings

theEricMa/ScaleDreamer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

[Paper] | [Project Page]


🔥 News

  • 2024.07.02 ScaleDreamer is accepted by ECCV 2024
  • 2024.06.23 Create this repo.

⚙️ Dependencies and Installation

Follow threestudio to set up the conda environment, or use our provided instructions as below.
  • Create a virtual environment:
conda create -n scaledreamer python=3.10
conda activate scaledreamer
  • Install PyTorch
# Prefer using the latest version of CUDA and PyTorch 
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
  • (Optional, Recommended) Install xFormers for attention acceleration.
conda install xformers -c xformers
  • (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:
pip install ninja
  • Install major dependencies:
pip install -r requirements.txt
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install git+https://github.com/KAIR-BAIR/nerfacc.git@v0.5.2

If you encounter errors while installing iNGP, it is recommended to check your gcc version. Follow these instructions to change the gcc version within your conda environment. Then return to the repository directory to install iNGP and NerfAcc ⬆️ again.

conda install -c conda-forge gxx=9.5.0
cd  $CONDA_PREFIX/lib
ln -s  /usr/lib/x86_64-linux-gnu/libcuda.so ./
cd <your repo directory>
Download 2D Diffusion Priors.
python scripts/download_pretrained_models.py

🌈 Prompt-Specific 3D Generation

  • ASD with SD (Stable Diffusion). Feel free to change the prompt accordingly.
sh scripts/single-prompt-benchmark/asd_sd_nerf.sh
  • ASD with MV (MVDream). Feel free to change the prompt accordingly.
sh scripts/single-prompt-benchmark/asd_mv_nerf.sh

🚀 Prompt-Amortized 3D Generator Tranining

The following 3D generator architectures are available:

Network Description File
Hyper-iNGP iNGP with text-conditioned linear layers,adopted from ATT3D geometry, background
3DConv-net A StyleGAN generator that outputs voxels with 3D convolution, code adopted from CC3D geometry, architecture
Triplane-Transformer Transformer-based 3D Generator, with Triplane as the output structure, adopted from LRM geometry, architecture

The following corpus datasets are available:

Corpus Description File
MG15 15 text pormpts from Magic3D project page json
DF415 415 text pormpts from DreamFusion project page json
AT2520 2520 text pormpts from ATT3D experiments json
DL17k 17k text pormpts from Instant3D release json
CP100k 110k text pormpts from Cap3D dataset json

Run the following script to start training

  • Hyper-iNGP with SD on MG15
sh scripts/multi-prompt-benchmark/asd_sd_hyper_iNGP_MG15.sh
  • 3DConv-net with SD on DF415
sh scripts/multi-prompt-benchmark/asd_sd_3dconv_net_DF415.sh
  • 3DConv-net with SD on AT2520
sh scripts/multi-prompt-benchmark/asd_sd_3dconv_net_AT2520.sh
  • Triplane-Transformer with MV on DL17k
sh scripts/multi-prompt-benchmark/asd_mv_triplane_transformer_DL17k.sh
  • 3DConv-net with SD on CP100k
scripts/multi-prompt-benchmark/asd_sd_3dconv_net_CP100k.sh

📷 Prompt-Amortized 3D Generator Evaluation

Create a directory to save the checkpoints

mkdir pretrained/3d_checkpoints

The checkpoints of the ⬆️ experiments are available. Save the corresponding .pth file to 3d_checkpoint, then run the scripts as below.

sh scripts/multi_prompts_benchmark_evaluation/asd_sd_hyper_iNGP_MG15.sh
sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_DF415.sh
sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_AT2520.sh
  • Triplane-Transformer with MV on DL17k. The ckpt in Google Drive
sh scripts/multi_prompts_benchmark_evaluation/asd_mv_triplane_transformer_DL17k.sh
sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_CP100k.sh

The rendered images and videos are saved in outputs/<experiment_name>/save/<num_iter> directory. Compute the metrics with CLIP via

python evaluation/CLIP/evaluation_amortized.py --result_dir <video_dir>

🕹️ Create Your Own Modules

3D Generator

  1. Place the code in custom/amortized/models/geometry, check out the other code in that directory for reference.
  2. Update your <name_of_file> in custom/amortized/models/geometry/__init__.py
  3. Create your own config file, type in your registered module name in the system.geometry_type argument, check out the other code in the configs/multi-prompt_benchmark directory for reference.

2D Diffusion Guidance

  1. Put your code in threestudio/models/guidance, take a look at the other code in that directory or other guidance for reference.
  2. Update your <name_of_file> in threestudio/models/guidance/__init__.py
  3. Create your own config file, type in your registered module name in the system.guidance_type argument, take a look at the other code in the configs/multi-prompt_benchmark directory for reference.

Text corpus

  1. Create a JSON file that lists the training, validation, and test text prompts in the load directory
  2. Enter the name of this JSON file into the system.prompt_processor.prompt_library argument to set up the corpus, take other commands in the scripts directory for reference

You can also add your modules for data, renderer, prompt_processor, etc.

📖 Citation

If you find this paper helpful, please cite

@article{ma2024scaledreamer,
  title={ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation},
  author={Ma, Zhiyuan and Wei, Yuxiang and Zhang, Yabin and Zhu, Xiangyu and Lei, Zhen and Zhang, Lei},
  journal={arXiv preprint arXiv:2407.02040},
  year={2024}
}

🙏 Acknowledgement

  • threestudio, a clean and extensible codebase for text-to-3D.
  • MVDream-threestudio, the implementation of MVDream for text-to-3D.
  • OpenLRM, the implementation of LRM. We develop the 3D generator of Triplane-Transformer on top of it.
  • Cap3D, which provides the text caption of Objaverse. We develop the corpus of CP100k on top of it.

About

This is the official repository for ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation [ECCV2024]

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published