Ollama - latest small LLMs which work well on modern CPUs
Daniel Nashed – 1 May 2025 16:29:14
Domino IQ does now support external LLMs. One of the best ways to run LLMs is Ollama.
Running on an NVIDIA CPU is the best performing way.
Running on a Mac Apple Silicon is probably the second best option.
But if you choose a small model, the performance is quite OK on a modern CPU.
It should be a modern CPU like Intel 12 or 13 generation.
The following recent models will work well.
For external LLMs you will need TLS, which can be implemented with a simple NGINX reverse proxy setup for example.
qwen3
ollama run qwen3:0.6b
ollama run qwen3:1.7b
the 0.6b model is really small but quite OK. And very fast. But the next bigger one is still also OK from performance point of view.
https://ollama.com/library/qwen3
granite3.3
ollama run granite3.3:2b
https://ollama.com/library/granite3
gemma3
ollama run gemma3:1b
https://ollama.com/library/gemma3
- Comments [0]