
Fredy Acuna / December 8, 2025 / 8 min read
This guide shows you how to properly self-host Google's Gemma AI model on Dokploy using Ollama. I've corrected several issues from an existing tutorial to make this production-ready with proper networking, persistent storage, and concurrency handling.
Before starting, ensure you have:
Gemma is Google's open-source AI model family. Ollama is a tool that makes running AI models locally simple—it handles downloading, serving, and API endpoints automatically.
When you run ollama serve, it starts an HTTP server on port 11434 that accepts requests and returns AI-generated responses. This is what we'll deploy.
The gemma3:270m model is lightweight (~270MB), so it runs on minimal hardware. Choose your setup based on your use case:
Use this for personal projects or cheap VPS instances:
| Resource | Specification |
|---|---|
| CPU | 1 vCPU |
| RAM | 1 GB |
| Storage | 5 GB |
| GPU | Not required |
Note: This handles 1 user quickly. If 2 people query at the same time, the second waits a few seconds.
Use this if you expect 5-10 concurrent users or automated bots querying frequently:
| Resource | Specification |
|---|---|
| CPU | 2 vCPUs |
| RAM | 2-4 GB |
| Storage | 5-10 GB |
| GPU | Not required |
Why more RAM? Long conversations grow the context window (memory of previous messages), which can spike memory usage. 2GB is the safety zone.
Why 2 vCPUs? The HTTP server handling JSON requests and the inference engine compete for CPU. 2 cores keep the API responsive while the model thinks.
If you want better quality responses, consider larger models like gemma:2b (1.7GB) or gemma:7b (requires more RAM/GPU).
gemma-serviceGo to the General tab, then click Raw. Paste the following configuration:
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
environment:
- OLLAMA_HOST=0.0.0.0
- OLLAMA_ORIGINS=*
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=1
volumes:
- ollama_storage:/root/.ollama
# Uncomment if you have GPU available:
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
# Optional: ChatGPT-like web interface
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
volumes:
- open-webui:/app/backend/data
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_SECRET_KEY=your-secret-key-here
restart: unless-stopped
volumes:
ollama_storage:
open-webui:
Important: We don't set
OLLAMA_MODELSas an environment variable. Setting it changes the storage path and breaks persistence. Instead, we download models manually after deployment (Step 4).
Click Save.
Let's break down what makes this configuration production-ready:
volumes:
- ollama_storage:/root/.ollama
Without this, you'd lose downloaded models every time the container restarts. The original tutorial missed this—meaning you'd have to re-download the model after every deployment.
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=1
| Variable | Purpose |
|---|---|
OLLAMA_NUM_PARALLEL=4 | Allows 4 concurrent requests (4 users at the same time) |
OLLAMA_MAX_LOADED_MODELS=1 | Keeps only 1 model in memory (saves RAM) |
- OLLAMA_ORIGINS=*
Allows requests from any origin. Useful if you're calling the API from a frontend application.
You need to add domains for the services you want to expose. Go to the Domains tab in your service.
ollamaOption A: Generate a traefik.me URL (Recommended for Testing)
Click the Generate button in Dokploy. It will automatically create a URL like:
main-ollama-wv9tts-9dc2f9-209-112-91-61.traefik.me
This gives you instant HTTPS without any DNS configuration.
Option B: Use Your Own Domain
Enter your subdomain: ollama.yourdomain.com
Make sure you have a DNS A record pointing to your Dokploy server's IP.
11434 (this is the port Ollama exposes internally)/If you included the Open WebUI service, add another domain for it:
open-webuichat.yourdomain.com)8080Now, here's the critical step—you must download the model manually:
ollama containerollama pull gemma3:270m
Wait for the download to complete. You can verify it worked with:
ollama list
You should see:
NAME ID SIZE MODIFIED
gemma3:270m abc123... 270MB 2 minutes ago
Visit your domain in a browser. You should see:
Ollama is running
Now test the API with curl:
curl https://ollama.yourdomain.com/api/generate -d '{
"model": "gemma3:270m",
"prompt": "Why is the sky blue?",
"stream": false
}'
You should receive a JSON response with the AI-generated answer.
curl -X POST https://ollama.yourdomain.com/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3:270m",
"prompt": "Explain Docker in one sentence.",
"stream": false
}'
curl -X POST https://ollama.yourdomain.com/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3:270m",
"prompt": "Write a haiku about programming.",
"stream": false,
"options": {
"temperature": 0.7,
"num_predict": 50
}
}'
curl -X POST https://ollama.yourdomain.com/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3:270m",
"messages": [
{"role": "user", "content": "What is machine learning?"}
],
"stream": false
}'
If you included Open WebUI in your Docker Compose, you now have a ChatGPT-like interface for interacting with your models.
https://chat.yourdomain.com)gemma3:270m from the model dropdownOpen WebUI provides:
Tip: You can download additional models directly from Open WebUI's settings, or via the Ollama container terminal.
If you get model not found, you forgot to download it. SSH into the container and run:
ollama pull gemma3:270m
Check the logs in Dokploy. Common causes:
OLLAMA_NUM_PARALLEL if RAM is limited (try OLLAMA_NUM_PARALLEL=1 or 2)11434 for Ollama, 8080 for Open WebUI)Once your setup is working, you can easily switch models:
# Inside the container terminal
ollama pull gemma:2b # 1.7 GB, better quality
ollama pull gemma:7b # 4.2 GB, requires more RAM
ollama pull llama3.2:3b # Alternative model
Update your API calls to use the new model name.
For production deployments:
your-secret-key-here with a strong, random string for Open WebUIOLLAMA_ORIGINS=* to specific domains if not using Open WebUIYou now have a production-ready Gemma AI service running on Dokploy with:
This setup is significantly more robust than exposing ports directly and handles real-world usage patterns.