[Home]

Third generation AI image models on a low-end PC

Another new year and another set of new generation AI text-to-image models have arrived, so time to tinker. While the early SD1.5 was cool (and still is) for its hackability, FLUX.1 was mostly boring despite giving impressive quality out of the box. But FLUX.2 and Z-Image are again bringing in a new fresh kick, and especially in the text encoder side the development is interesting.

Even though all the AI stuff is full of bloat and brute force, the quantization stuff nowadays helps keeping GPU and VRAM requirements somewhat reasonable. However the combined increased size of denoiser and text encoder models is still getting huge, so now the CPU RAM is now becoming actually a bigger bottleneck on my low-end machine.

To get reasonable throughput, models must stay resident in RAM. With ``small'' 16 GB RAM this is getting tough, so one needs to carefully select the right combinations. While it's still possible to use the bigger models, the model re-loading is so painfully slow it's not really worth it. Especially with distilled models where the inference is fast, constant re-loading would multiply the wall time per image.

With InvokeAI, assuming you have a dedicated headless machine, you can pin the model cache size to 14 GB - based on experiments the remaining 2 GB is sufficient for the InvokeAI itself and underlaying OS/SW stack to still run smoothly. Then you have 14 GB for models, and below table shows what can actually fit there at maximum:

ModelVariantSize (GB)Text EncoderSizeTotalUsage
FLUX.1 Fill DevQ4_K_M6.94BNB INT84.9011.8411.89
FLUX.1 Fill DevQ5_K_M8.43BNB INT84.9013.3313.27
FLUX.2 Klein 4BBF167.75Q84.2812.0312.44
FLUX.2 Klein 4BQ84.30BF168.0612.3612.64
FLUX.2 Klein 4B BaseBF167.75Q84.2812.0312.44
FLUX.2 Klein 9BQ4_K_M5.91Q5_K_M5.8511.7613.57
FLUX.2 Klein 9BQ5_K_M7.02Q4_K_M5.0312.0513.81
Z-Image BaseQ87.22Q84.2811.5013.09
Z-Image TurboQ96.58Q84.2810.8612.64

The ``usage'' referes to InvokeAI's report of ``cache high water mark'' after real repeated usage. Sometimes it's much bigger than the calculated total size, who knows why, but at least VAE etc. needs to also go there.

Inference performance

Just some performance numbers for my personal use and reference.

Inference performance with NVIDIA RTX 3060 12 GB VRAM:

ModelVariantInference stepsTotal wall time
FLUX.1 Fill DevQ4_K_M30 (Euler)140 s
FLUX.1 Fill DevQ5_K_M30 (Euler)150 s
FLUX.2 Klein 4BBF16615 s
FLUX.2 Klein 4BQ8615 s
FLUX.2 Klein 4B BaseBF1630 (Euler)55 s
FLUX.2 Klein 9BQ4_K_M635 s
FLUX.2 Klein 9BQ5_K_M635 s
Z-Image BaseQ830 (Euler)195 s
Z-Image TurboQ88 (Euler)30 s

These include the full text encoder pass, i.e. a new prompt always.

Training

With musubi-tuner, both the FLUX.2 Klein 4B and Z-Image LoRA training works with 16 GB RAM and 12 GB VRAM. During training the CPU RAM is not an issue as latents and text encoder output can be cached, and only the denoiser is used during the training.

ModelLoRA rankmemory saveresolutionVRAM usagespeed
FLUX.2 Klein 4B Base16(not needed)5128456MiB2.0 s/it
FLUX.2 Klein 4B Base16(not needed)102410156MiB6.4 s/it
Z-Image Base16blocks_to_swap=851210826MiB3.7 s/it
Z-Image Base16blocks_to_swap=10102411578MiB14 s/it

Last updated: 2026-04-07 22:51 (EEST)