Daniel Nashed's Blog

Domino on Linux/Unix, Troubleshooting, Best Practices, Tips and more ...

Daniel Nashed

Tags

Ollama keep models loaded for longer than 5 minutes idle time

Daniel Nashed – 20 April 2025 11:24:15

Ollama keeps loaded models for 5 minutes in memory if they are idle.

When you use the API to load a model, you can specify how long it stays in memory by model.
But probably it would be a good idea to change the default time as well.

The setting is controlled via an environment variable.
For a container image you would just pass it as a parameter: -e "OLLAMA_KEEP_ALIVE=30m"

For Ollama running as a Linux service, you have to add it to the systemd service like this:

/etc/systemd/system/ollama.service Environment="OLLAMA_KEEP_ALIVE=30m"

The default unit is seconds. You can specify a unit behind the idle shutdown delay (like m for minutes).

I just looked it up set it on my servers.

Ollama loads LLMs automatically if they have been downloaded.
Dynamically loading of LLMs is quite useful. But loading them again after 5 minutes of idle time would delay requests if the LLM as been unloaded.

You can also configure how many models should be loaded at most via OLLAMA_MAX_LOADED_MODELS (the default is 3).
Usually you can't run too many models at the same time. But if you are testing with smaller models, this still might be important:

-- Daniel

Example for some smaller models all loaded into the CPU in parallel:
ollama ps NAME ID SIZE PROCESSOR UNTIL qwen2.5:0.5b a8b0c5157701 1.3 GB 100% GPU 29 minutes from now granite3.2:2b 9d79a41f2f75 3.2 GB 100% GPU 29 minutes from now gemma3:1b 8648f39daa8f 1.9 GB 100% GPU 27 minutes from now

Comments [0]

Domino on Linux/Unix, Troubleshooting, Best Practices, Tips and more ...

Daniel Nashed

Daniel Nashed's Blog

Ollama keep models loaded for longer than 5 minutes idle time

Recent Entries

Feeds

Links

Archives