Ollama keep models loaded for longer than 5 minutes idle time
Daniel Nashed – 20 April 2025 11:24:15
Ollama keeps loaded models for 5 minutes in memory if they are idle.
When you use the API to load a model, you can specify how long it stays in memory by model.
But probably it would be a good idea to change the default time as well.
The setting is controlled via an environment variable.
For a container image you would just pass it as a parameter:
-e "OLLAMA_KEEP_ALIVE=30m"
For Ollama running as a Linux service, you have to add it to the systemd service like this:
/etc/systemd/system/ollama.service
Environment="OLLAMA_KEEP_ALIVE=30m"
The default unit is seconds. You can specify a unit behind the idle shutdown delay (like m for minutes).
I just looked it up set it on my servers.
Ollama loads LLMs automatically if they have been downloaded.
Dynamically loading of LLMs is quite useful. But loading them again after 5 minutes of idle time would delay requests if the LLM as been unloaded.
You can also configure how many models should be loaded at most via OLLAMA_MAX_LOADED_MODELS (the default is 3).
Usually you can't run too many models at the same time. But if you are testing with smaller models, this still might be important:
-- Daniel
Example for some smaller models all loaded into the CPU in parallel:
ollama ps
NAME ID SIZE PROCESSOR UNTIL
qwen2.5:0.5b a8b0c5157701 1.3 GB 100% GPU 29 minutes from now
granite3.2:2b 9d79a41f2f75 3.2 GB 100% GPU 29 minutes from now
gemma3:1b 8648f39daa8f 1.9 GB 100% GPU 27 minutes from now
- Comments [0]