← All articles
Developer API, MCP and webhooks·

Set up Getwello with Ollama

Run a local LLM with the Getwello MCP server. Care data never leaves the house.

This guide runs the Getwello MCP server alongside a local Ollama model and an MCP-capable frontend. Your prompts and the model's responses stay on your machine; the only thing crossing the network is the Getwello API call itself.

What you need

  • A machine with at least 8GB unified memory (Apple Silicon) or a recent GPU (Windows/Linux).
  • Ollama installed (ollama.com).
  • An MCP-capable frontend: Open WebUI is the most polished as of 2026.
  • A Getwello Coordinator account with active subscription.

Step 1: Install and pick a model

brew install ollama  # macOS
ollama pull llama3.1
ollama run llama3.1

Llama 3.1 8B is a solid baseline for tool-calling. Qwen 2.5 7B and Mistral Nemo work too. Smaller (3B-class) models hallucinate tool calls; avoid for production.

Step 2: Install Open WebUI

Easiest is Docker:

docker run -d -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui ghcr.io/open-webui/open-webui:main

Open http://localhost:3000, finish the first-run setup.

Step 3: Add the Getwello MCP server to Open WebUI

Settings → Tools → Add MCP server. Paste the same npx config as Claude Desktop:

{
  "command": "npx",
  "args": ["-y", "@getwello-app/mcp-server"],
  "env": {
    "GETWELLO_API_KEY": "gw_live_your_real_key"
  }
}

Step 4: Try a query

Pick the Llama model, start a chat. Sixteen Getwello tools are available; the model picks the right one based on your question:

  • “How is Mum doing this week?” — pulls stats + check-ins, summarises
  • “Has she checked in today?” — calls get_check_in_today
  • “Are there gap days next week?” — calls list_gap_days
  • “Schedule me to visit Tuesday at 2pm.” — calls schedule_visit

All locally: the model never sends prompts off your machine, and the only network call is the Getwello API itself.

Troubleshooting

Model doesn't call the tool: use a model trained for tool-calling. Llama 3.1, Qwen 2.5, Mistral Nemo. Smaller models lack the training.

Slow responses: 7-8B models are slow on integrated GPUs. Try a quantised variant (Q4_K_M) or a smaller model.

Data privacy concern: the only network call is the Getwello API itself. The model never sees your prompts on any server but yours.

More on developer api, mcp and webhooks

Didn't answer your question?

Email hello@getwello.co.uk