Domino on Linux/Unix, Troubleshooting, Best Practices, Tips and more ...

alt

Daniel Nashed

Ollama keep models loaded for longer than 5 minutes idle time

Daniel Nashed – 20 April 2025 11:24:15

Ollama keeps loaded models for 5 minutes in memory if they are idle.

When you use the API to load a model, you can specify how long it stays in memory by model.

But probably it would be a good idea to change the default time as well.


The setting is controlled via an environment variable.
For a container image you would just pass it as a parameter:


-e "OLLAMA_KEEP_ALIVE=30m"


For Ollama running as a Linux service, you have to add it to the systemd service like this:


/etc/systemd/system/ollama.service


Environment="OLLAMA_KEEP_ALIVE=30m"


The default unit is seconds. You can specify a unit behind the idle shutdown delay (like m for minutes).


I just looked it up set it on my servers.


Ollama loads LLMs automatically if they have been downloaded.

Dynamically loading of LLMs is quite useful. But loading them again after 5 minutes of idle time would delay requests if the LLM as been unloaded.


You can also configure how many models should be loaded at most via OLLAMA_MAX_LOADED_MODELS (the default is 3).

Usually you can't run too many models at the same time. But if you are testing with smaller models, this still might be important:



-- Daniel



Example for some smaller models all loaded into the CPU in parallel:

ollama ps

NAME             ID              SIZE      PROCESSOR    UNTIL

qwen2.5:0.5b     a8b0c5157701    1.3 GB    100% GPU     29 minutes from now

granite3.2:2b    9d79a41f2f75    3.2 GB    100% GPU     29 minutes from now

gemma3:1b        8648f39daa8f    1.9 GB    100% GPU     27 minutes from now


Links

    Archives


    • [HCL Domino]
    • [Domino on Linux]
    • [Nash!Com]
    • [Daniel Nashed]