-
Notifications
You must be signed in to change notification settings - Fork 6
Optimizations
Proposed in refactor: v1.9.0 + scheduler, idle RAM management, Observation rewrite (#88)
The current setup for all stable diffusion clients is as such: load the current model into RAM, use said model to generate existing prompt, leave model (and weights, prompt-dependencies, etc.) in memory until either: it is overridden by another model, or the process is shut down. This is beneficial for lower-end hardware, as it removes the need to reload model on each new prompt generation, saving anywhere from 30-90s in between prompts.
However, for higher-end hardware (especially the M3 Pro/Max), the time it takes to load an SDXL model into RAM usually takes a maximum of 2-3s. As a result, these clients will reserve 30-50GB of active memory for as long as the process is running—all to save this particular user a second or two of time (2-3s in worst case scenarios). Furthermore, you can restart the Python process, and load the previous model into RAM, which will only result in ~5GB of idle memory usage and add a measly 1-2s of time onto each generation queue.
As such, I propose two separate strategies that I plan to implement (as options) within SwiftDiffusion:
Setup | Idle RAM usage (SDXL) | Added time (per queue) |
---|---|---|
default (current) |
30-50GB+ | 0s |
restartWithLoad |
5-6GB | 1-2s |
startOnQueue |
1-2GB | 2-3s |
After a generation queue has finished successfully:
-
restartWithLoad
: end Py process, start Py process, load last model into RAM -
startOnQueue
: end Py process, start Py process with no loaded model
On new generation queue:
-
restartWithLoad
: make generation request -
startOnQueue
: load model into RAM, make generation request