Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partial models spec in model_index.json #10413

Open
sayakpaul opened this issue Dec 31, 2024 · 7 comments
Open

partial models spec in model_index.json #10413

sayakpaul opened this issue Dec 31, 2024 · 7 comments
Assignees

Comments

@sayakpaul
Copy link
Member

          one slightly offtopic note - if we're coming up with new format, lets make sure it covers the painpoints of existing ones, otherwise its-just-another-standard.

as is dduf does have benefits, but it would be great if it could cover one of the most common use-cases: partial models.
e.g.

  • typical sdxl model in single-file format includes unet and vae, but te1 and te2 are up to user to load.
  • same sdxl model in diffusers folder-style format includes all components, but that creates significant duplication of storage

why not have model config that can point each model component not just to subfolder, but to other repo as well.
so for example, sdxl model could have unet-only with links to vae, te1, te2 in reference repo if they are unchanged.

for example:

{
  "_class_name": "StableDiffusionXLPipeline",
  "scheduler": ["diffusers", "EulerDiscreteScheduler"],
  "unet": ["diffusers", "UNet2DConditionModel"],
  "vae": ["diffusers", "AutoencoderKL"],
  "text_encoder": ["transformers", "CLIPTextModel",
  "text_encoder_2": ["transformers", "CLIPTextModelWithProjection"],
  "tokenizer": ["transformers", "CLIPTokenizer"],
  "tokenizer_2": ["transformers", "CLIPTokenizer"]
  "_locations": {
      "text_encoder": "stabilityai/stable-diffusion-xl-base-1.0/text_encoder",
      "tokenizer": "stabilityai/stable-diffusion-xl-base-1.0/tokenizer",
      "text_encoder_2": "stabilityai/stable-diffusion-xl-base-1.0/text_encoder_2",
      "tokenizer_2": "stabilityai/stable-diffusion-xl-base-1.0/tokenizer_2"
  }
}

Originally posted by @vladmandic in #10037 (comment)

@sayakpaul
Copy link
Member Author

This sounds like a power-user feature and IMO is already possible. Users can load a pipeline partially from a pre-trained ckpt and use different components.

from diffusers import StableDiffusionXLPipeline, AutoencoderKL
import torch

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", 
    vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=tprch.float16)
    torch_dtype=torch.float16,
).to("cuda")

This way, all components from "stabilityai/stable-diffusion-xl-base-1.0" will be downloaded except for the vae. This works with local files, too.

@vladmandic
Copy link
Contributor

This sounds like a power-user feature and IMO is already possible. Users can load a pipeline partially from a pre-trained ckpt and use different components.

its possible if user knows exactly where to load components from - that is what i'm talking about - standardize a way for model publishers so each component has its "default" location well known and defined and state explicitly which component is present in the DDUF vs which is NOT (but it would still be required).

even if the proposed change is a no-op as far as diffusers implementation goes (although it could be handled automatically in from_pretrained, it provides information that is currently completely missing - it gives app developers option to implement individual component loading automatically for any partial model and thus have complete model loader.

@sayakpaul
Copy link
Member Author

state explicitly which component is present in the DDUF

DDUFs are packaged with all the pipeline components, not partial. This is what will be shipped in the v0 and will be refined in the subsequent iterations. Cc: @SunMarc

it gives app developers option to implement individual component loading automatically for any partial model and thus have complete model loader.

We will be shipping a lot of "auto" classes at the model-level and I believe that should make it quite easier to implement what you're envisioning. Cc: @yiyixuxu

@Nerogar
Copy link
Contributor

Nerogar commented Dec 31, 2024

DDUFs are packaged with all the pipeline components, not partial. This is what will be shipped in the v0 and will be refined in the subsequent iterations.

Maybe it makes sense to add some background information for this feature request. There was a discussion a while back on Discord about a proposed new model format that could solve a few pain points we currently have. Let's take Sana as an example for modern diffusion model. It's DiT has 1.6B parameters, while it's text encoder is roughly twice as large.

Models are commonly fine tuned, while leaving the text encoder (and vae) frozen. But to distribute that fine tuned model, we still need to distribute the text encoder and vae. Even if their weights haven't changed from the base model. This proposal would allow the distribution of a partial model file that just points to already existing models. Without this information, any user of that model would need to guess where they can get the missing parts.

The plan would be to start switching to DDUF as a new standardized file format for most tools (model output of fine tuning tools, and model input for inference tools). Right now there is absolutely no standard, and everyone just does their own thing. With this addition, we would have most of the issues solved and could start standardizing the eco system a bit more.

We will be shipping a lot of "auto" classes at the model-level and I believe that should make it quite easier to implement what you're envisioning

What exactly is your plan for this new file format? Is it only supposed to be used inside the diffusers library, or do you think it could be used outside of that library? Those classes would only be useful within diffusers. Some of the more popular tools don't use this library, but would still benefit from a standardized model format. (similar to how the hf hub can be used outside of the transformers/diffusers libraries)

As for the specific format proposal, I think this needs a bit more thought. The proposed format would only allow models to be downloaded automatically from the hf hub. But what happens if the user already has a version of that model downloaded manually? In that case it would be useful to have some kind of type identifier in there as well. Also, can this only point to a subfolder in a hf repo, or would it be useful if it could point to a different DDUF file that's hosted somewhere else?

@vladmandic
Copy link
Contributor

DDUFs are packaged with all the pipeline components, not partial.

sorry to hear that, as it makes DDUF almost dead-on-arrival since new models are just huge with massive frozen components like LLM that do NOT need to be redistributed in each and every finetune of the model.

@hipsterusername
Copy link

@sayakpaul - To be very clear, this isn’t a “power user” feature request. This is an attempt to help turn the DDUF format into “the standard” that has been actively discussed/debated in the OMI discord.

Nerogar and the Onetrainer team have done some of the deepest thinking about this, but the aim is for the standard to serve:

  • Model Hubs (HF)
  • Trainers
  • Application Developers

Partial model distribution and composability is a problem faced often during both training and in deployment.

@sayakpaul
Copy link
Member Author

sayakpaul commented Dec 31, 2024

I mentioned "power-user" because:

  • Currently, it's already possible to do partial loading in diffusers.
  • But it's probably not very straightforward as we require the user to know some stuff beforehand.

The original issue was about model_index.json specs, which are inherently related to the diffusers format. So, the "auto" model class thingy is different from DDUF.

Thanks for providing the DDUF-related comments. As mentioned also in the docs, the v0 will be extremely minimal and we will be improving it as we go. I will let @julien-c @LysandreJik @Wauplin shed more details if I am missing something if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants