Capabilities

Ollama local model setup

Download and start Ollama, pull a local model, then select Ollama in Pemo for local Q&A.

Selecting an Ollama local model in Pemo
Ollama runs models on your own computer. After starting Ollama and pulling a model, Pemo detects the local Ollama service automatically; click Refresh to check availability and load the model list.

1. Download and install Ollama

Ollama is a local model runner. Once it is installed and running, Pemo can call local Ollama models for Q&A.

  1. Open the Ollama download page.
  2. Choose macOS, Windows, or Linux.
  3. On macOS and Windows, download the installer and follow the prompts.
  4. On Linux, follow the terminal command shown on the download page.
  5. Start Ollama and keep it running in the background.

For platform details, also check the Ollama GitHub documentation.

2. Download a local model

After installing Ollama, download at least one model. Use the exact model name shown in the Ollama Library.

In a terminal, run:

ollama run <model-name>

For example, Ollama's official docs show commands like:

ollama run gemma4

On first run, Ollama downloads the model and opens a local chat session. Type /bye to leave the terminal chat.

Start with a smaller model if you are unsure. Larger models need more memory and GPU/CPU resources.

3. Use Ollama in Pemo

  1. Make sure Ollama is started and running in the background.
  2. Open Pemo settings.
  3. Go to AI service management.
  4. Pemo detects the local Ollama service automatically, so you do not need to add a service URL manually.
  5. Click Refresh to check whether Ollama is available and load downloaded models.
  6. Select a local model that has already been downloaded by Ollama.
  7. Return to document Q&A or general Q&A and choose the Ollama model.
Pemo LLM service management with Ollama local models
After Ollama starts, Pemo detects the local Ollama service automatically. Click Refresh to check whether it is available and load the models already downloaded on your computer.

4. Checklist

  • Ollama is running in the background.
  • ollama list shows the model you downloaded.
  • Pemo shows Ollama and the downloaded models after refreshing.
  • Your computer has enough memory and compute for the selected model.
  • Sensitive document Q&A is using Ollama, not a cloud model.

Good fit for

  • Local Q&A over sensitive materials.
  • Reducing cloud model calls when your computer is capable enough.
  • Testing open models on your own machine.
  • Using Pemo document Q&A in a local-first workflow.
Ollama local model setup | Guide