Docker - a new player in the LLM business
Daniel Nashed – 5 May 2025 09:30:28
Docker has a new feature in beta. Running models on Docker.
There is not much information about the underlaying technology used.
But during installation you can see that it installs a llama-server (which Ollama and also Domino IQ are using).
Here is a link to the official documentation --> https://docs.docker.com/model-runner/
Docker provides a registry for models. For example: https://hub.docker.com/r/ai/qwen3
To pull a model you just use the new model command. The following is a good small model to test.
docker model pull ai/qwen3:0.6B-Q4_0
Once downloaded you can list models
docker model list
MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED SIZE
ai/qwen3 8.19 B IQ2_XXS/Q4_K_M qwen3 79fa56c07429 4 days ago 4.68 GiB
ai/qwen3:0.6B-Q4_0 751.63 M Q4_0 qwen3 df9f2a333a63 4 days ago 441.67 MiB
There are multiple ways to access the AI components.
1. Command Line
Form command line you can just start a model very very similar to what Ollama does
docker model run ai/qwen3:0.6B-Q4_0
2. Within containers
From within containers you can just use the API end-points against: http://model-runner.docker.internal/
For example the OpenAI end-point:
POST /engines/llama.cpp/v1/chat/completions
3. Docker Socket
curl --unix-socket $HOME/.docker/run/docker.sock \
localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions
4. Expose a TCP socket on Docker host loopback interface
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions
First look results
This looks like a great new option to run LLM models.
For my first test it looked like it was not using my GPU.
But even on my very old Thinkpad (will test with the new GPU machine) the performance with this small model was OK.
This just the beginning and there is more to discover. I just took a quick peek into it.
There is more to discover. Alone the integration into the registry and having everything from one vendor is interesting.
In addition it is part of the Docker stack and companies would not need to use an open source project like Ollama directly.
This sounds like a smart Docker move to me.
Below are some screen shots from my test this morning.
- Comments [0]