Optimizations

Idle RAM management

Proposed in refactor: v1.9.0 + scheduler, idle RAM management, Observation rewrite (#88)

The current setup for all stable diffusion clients is as such: load the current model into RAM, use said model to generate existing prompt, leave model (and weights, prompt-dependencies, etc.) in memory until either: it is overridden by another model, or the process is shut down. This is beneficial for lower-end hardware, as it removes the need to reload model on each new prompt generation, saving anywhere from 30-90s in between prompts.

However, for higher-end hardware (especially the M3 Pro/Max), the time it takes to load an SDXL model into RAM usually takes a maximum of 2-3s. As a result, these clients will reserve 30-50GB of active memory for as long as the process is running—all to save this particular user a second or two of time (2-3s in worst case scenarios). Furthermore, you can restart the Python process, and load the previous model into RAM, which will only result in ~5GB of idle memory usage and add a measly 1-2s of time onto each generation queue.

As such, I propose two separate strategies that I plan to implement (as options) within SwiftDiffusion:

Setup	Idle RAM usage (SDXL)	Added time (per queue)
`default` (current)	30-50GB+	0s
`restartWithLoad`	5-6GB	1-2s
`startOnQueue`	1-2GB	2-3s

After a generation queue has finished successfully:

restartWithLoad: end Py process, start Py process, load last model into RAM
startOnQueue: end Py process, start Py process with no loaded model

On new generation queue:

restartWithLoad: make generation request
startOnQueue: load model into RAM, make generation request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations

Idle RAM management

Clone this wiki locally