How to Run AI Models Locally with Ollama
In this article, you’ll learn how to set up your own local generative AI using existing models such as Gemma 4 and Meta’s LLaMA 3.
The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) ollama/ollama is available on Docker Hub.
To run the Gemma 4 model locally using Ollama:
First of all, we need an LLM Serving Engine, such as Ollama: A framework for running large language models locally.
Pull the LLM models via Ollama
Load the LLM models via Ollama
Test the LLM models via Ollama CLI
Table of Contents
Install Ollama with Docker
There are several ways to install it on your machine,we will Running ollama via Docker.
Download
docker pull ollama/ollama:0.21.0

Version control:
Google officially released the Gemma 4 family on April 2, 2026, and ollama latest version has stabilized support for its unique architectures.
## Run the Container
You need to decide if you want to use CPU-only or GPU acceleration and specified a specific version.
CPU Only
~~~
docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama:0.21.0 ~~~
The container starts an API server, but it doesn't come with any LLMs pre-installed.
Pull and Run a Model via ollama
You need to "exec" into the container to pull a model (like Llama 4 ).
# into the container
docker exec -it ollama sh
# Download a model without running it
ollama pull [model-name]
# Run a Model
# A single command (ollama run gemma4:e4b) handles downloading, memory management, and API serving.
ollama run gemma4:e4b
Ollama will automatically download the model.llm server loading model.
output
# ollama run gemma4:e2b
>>>
docker logs -f ollama
output
time=2026-04-18T09:18:54.786Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
...
time=2026-04-18T09:20:29.899Z level=INFO source=server.go:1402 msg="llama runner started in 97.04 seconds"
[GIN] 2026/04/18 - 09:20:32 | 200 | 1m42s | 127.0.0.1 | POST "/api/generate"
[GIN] 2026/04/18 - 09:22:32 | 200 | 39.184533484s | 127.0.0.1 | POST "/api/chat"
Test the LLM models via Ollama CLI
start a chat -Test the LLM models
Verify - cli(Test in CLI)
# ollama run gemma4:e2b
>>> how
Thinking...
Thinking Process:
1. **Analyze the Input:** The input is "how".
...
Verify - api
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [{
"role": "user",
"content": "Why is the sky blue?"
}],
"stream": false
}'
output
root@debian:~# curl http://localhost:11434/api/chat -d '{
"model": "gemma4:e2b",
"messages": [{
"role": "user",
"content": "Why is the sky blue?"
}],
"stream": false
}'
{"model":"gemma4:e2b","created_at":"2026-04-18T09:24:45.242321396Z","message":{"role":"assistant","content":"The reason the sky appears blue is due to a phenomenon called **Rayleigh Scattering**. It is a result of how sunlight interacts with the small molecules of the Earth's atmosphere.\n\nHere is a detailed breakdown of the process:\n\n---\n\n### 1. The Ingredients: Sunlight and Atmosphere\n\n**A. Sunlight is White Light:**\nSunlight, which appears white to us, is actu
...
Verify - Web UI
open-webui
# ollama list
NAME ID SIZE MODIFIED
gemma4:e2b 7fbdbf8f5e45 7.2 GB 15 minutes ago
gemma4:e4b c6eb396dbd59 9.6 GB About an hour ago
gemma3:4b a2af6cc3eb7f 3.3 GB 2 hours ago
llama3:latest 365c0bd3c000 4.7 GB 3 hours ago
Quick Diagnostic Steps
docker logs -f ollama
Configuration Checklist
OS: 64-bit Debian12 OS
RAM:16G
CPU:Intel Cpu 10400
Larger Models:Gemma 4 E2B (4-bit)
Useful links
Recommended Models of ollama
https://ollama.com/library
gemma4 Model information
https://ollama.com/library/gemma4:e4b
model requires more system memory (9.8 GiB) than is available (4.7 GiB)
Comments
Be the first to post a comment