Ollama local model setup

1. Download and install Ollama

Ollama is a local model runner. Once it is installed and running, Pemo can call local Ollama models for Q&A.

For platform details, also check the Ollama GitHub documentation.

After installing Ollama, download at least one model. Use the exact model name shown in the Ollama Library.

In a terminal, run:

ollama run <model-name>

For example, Ollama's official docs show commands like:

ollama run gemma4

On first run, Ollama downloads the model and opens a local chat session. Type /bye to leave the terminal chat.

Start with a smaller model if you are unsure. Larger models need more memory and GPU/CPU resources.

Make sure Ollama is started and running in the background.
Open Pemo settings.
Go to AI service management.
Pemo detects the local Ollama service automatically, so you do not need to add a service URL manually.
Click Refresh to check whether Ollama is available and load downloaded models.
Select a local model that has already been downloaded by Ollama.
Return to document Q&A or general Q&A and choose the Ollama model.