In this article, you’ll learn how to set up your own local generative AI using existing models such as Gemma 4 and Meta’s LLaMA 3.

The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) ollama/ollama is available on Docker Hub.

To run the Gemma 4 model locally using Ollama:

First of all, we need an LLM Serving Engine, such as Ollama: A framework for running large language models locally.

Pull the LLM models via Ollama

Load the LLM models via Ollama

Test  the LLM models via Ollama CLI

Install Ollama with Docker

There are several ways to install it on your machine,we will Running ollama via Docker.

Download

docker pull  ollama/ollama:0.21.0

ollama-1.jpg

Version control:

Google officially released the Gemma 4 family on April 2, 2026, and ollama latest version has stabilized support for its unique architectures.

 

## Run the Container

 

You need to decide if you want to use CPU-only or GPU acceleration and specified a specific version.

 

CPU Only

~~~

docker run -d \

  -v ollama:/root/.ollama \

  -p 11434:11434 \

  --name ollama \

  ollama/ollama:0.21.0

~~~

The container starts an API server, but it doesn't come with any LLMs pre-installed. 

 

Pull and Run a Model via ollama

You need to "exec" into the container to pull a model (like Llama 4 ).

 

# into the container
docker exec -it ollama sh

# Download a model without running it
ollama pull [model-name]

# Run a Model
# A single command (ollama run gemma4:e4b) handles downloading, memory management, and API serving.
ollama run gemma4:e4b

 

Ollama will automatically download the model.llm server loading model.

 

output

# ollama run gemma4:e2b
>>>

 

docker logs -f ollama

output

time=2026-04-18T09:18:54.786Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
...
time=2026-04-18T09:20:29.899Z level=INFO source=server.go:1402 msg="llama runner started in 97.04 seconds"
[GIN] 2026/04/18 - 09:20:32 | 200 |         1m42s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2026/04/18 - 09:22:32 | 200 | 39.184533484s |       127.0.0.1 | POST     "/api/chat"

 

Test  the LLM models via Ollama CLI

start a chat -Test  the LLM models 

Verify - cli(Test in CLI)

 

# ollama run gemma4:e2b
>>> how
Thinking...
Thinking Process:

1.  **Analyze the Input:** The input is "how".
...

 

Verify - api

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

output

root@debian:~# curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e2b",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'
{"model":"gemma4:e2b","created_at":"2026-04-18T09:24:45.242321396Z","message":{"role":"assistant","content":"The reason the sky appears blue is due to a phenomenon called **Rayleigh Scattering**. It is a result of how sunlight interacts with the small molecules of the Earth's atmosphere.\n\nHere is a detailed breakdown of the process:\n\n---\n\n### 1. The Ingredients: Sunlight and Atmosphere\n\n**A. Sunlight is White Light:**\nSunlight, which appears white to us, is actu
...

 

Verify - Web UI

open-webui

 

# ollama list
NAME             ID              SIZE      MODIFIED          
gemma4:e2b       7fbdbf8f5e45    7.2 GB    15 minutes ago       
gemma4:e4b       c6eb396dbd59    9.6 GB    About an hour ago    
gemma3:4b        a2af6cc3eb7f    3.3 GB    2 hours ago          
llama3:latest    365c0bd3c000    4.7 GB    3 hours ago 

 

 

Quick Diagnostic Steps

docker logs -f ollama

 

Configuration Checklist

OS: 64-bit Debian12 OS

RAM:16G

CPU:Intel Cpu 10400

Larger Models:Gemma 4 E2B (4-bit)

 

Useful links

 

Recommended Models of  ollama

https://ollama.com/library

 

gemma4 Model information

https://ollama.com/library/gemma4:e4b

model requires more system memory (9.8 GiB) than is available (4.7 GiB)

Comments

Be the first to post a comment

Post a comment