Local Image Generation

I came across this YouTube video which showed that one can generate images locally even from USB drive (Video at the end)

As I might have mentioned earlier, I have a decent machine. It is an Intel i9 with 32GB of memory. So I thought I might give it a try.

The project claims that it works on Windows, Linux and Mac OS, when I tried this on my Mac machine, I found out that only Apple Silicon is supported. ¹

I went ahead anyway. Just wanted to see whether it’ll work or not. But mac.sh eventually failed, saying only Apple Silicon is supported.

Since it works on Windows and Linux, most of those machines are Intel machines, why does it NOT work on Intel mac? 🤔

So I went ahead and updated the code to remove this restriction.

I used this new (to me) agent called Freebuff ²

It mentioned (as I had suspected) that blocking Intel mac was an artificial restriction.

There is no technical reason why shouldn’t work.

It figured out that underlying Stable Diffusion does not ship pre-built binaries for Intel Apple.

But like Linux, build-from-the-source option is always available.

I did not have cmake which it identified as potential blocker.

So we fixed the setup script to check for this dependency and install, if it is not available.

I’m off to the races.

It installed cmake first and went ahead, downloaded and built stable diffusion locally.

After a few minutes I was greeted with the local app shown in the YouTube video.

I still needed to download the model.

I went ahead with the same model recommended in the YouTube video - cyberrealistic

It was a bit confusing (to me at least) that I had to explicitly load the model. But in hindsight it makes sense. One can download multiple models, but load only one at a time.

Once I loaded the model, I tried a simple prompt cute elephant (not that it matters 😄)

Video shows that it is able to generate images under 4 seconds, 5 seconds - a reasonable time.

But in my case it seemed to go on and on and on.

After some time, the web dashboard indicated that it is using CPU ³

Maybe that is why it was slow.

The UI indicated that there were 20 steps. When I gave up waiting, it had reached only fourth step out of those 20 steps. 😢

UI is considerate enough that it shows CPU usage and memory usage.

While the CPU usage did not go up, the memory usage was significantly high. I think it was 27 out of 32GB available on my machine.

Conclusion

Even in past, I have tried running local LLM via Ollama, and reached a conclusion that

While it is possible, it is not practical

i.e. As a proof of concept, we can run LLM locally, but

Either you need very high end machine with beefy GPU and/or 64GB+ memory

Or your machine becomes slow and/or output is not good. (When compared to frontier models we tend to use otherwise - even the free ones)

This experiment proved the same once again ⁴

One exception to this conclusion, is local audio transcription. I wrote about it here.

Here the models (at least default/ones I used) were small <500MB

Maybe that is why, I see no performance degradation. It is possible that both handy and spokenly are smart to load (and unload) the models in the memory, such that they are not occupying the memory all the time. But even otherwise I absolutely do not see any performance issues while using these tools for dictation.

Resources

YouTube Video

The git repo

It is clearly mentioned in the project README. ↩︎
It is so good that it deserves its own blog post. Someday! ↩︎
Another confusing thing. UI showed These : Loaded on Metal GPU, MTL0 (AMD Radeon Pro 5500M), sd. It showed “CPU Fallback” only after some time, so I assume it starts with AMD Graphics card, and then falls back to CPU? 🤷‍♂️ ↩︎
I tried same once again, this time with “negative prompt” as it says on UI - sketch like hoping that if I do not need high quality image, it might work. This time it “finished” in 516.9s (about 9 minutes) but produced a completely empty/white 512x512 image 🤣 ↩︎