In this article, you’ll learn how to set up your own local generative AI using existing models such as Gemma 4 and Meta’s LLaMA 3.

The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) ollama/ollama is available on Docker Hub.

To run the Gemma 4 model locally using Ollama:

First of all, we need an LLM Serving Engine, such as Ollama

load the LLM models via Ollama

 

Download

docker pull  ollama/ollama:0.21.0

ollama-1.jpg

 

## Run the Container

 

You need to decide if you want to use CPU-only or GPU acceleration.

 

CPU Only

~~~

docker run -d \

  -v ollama:/root/.ollama \

  -p 11434:11434 \

  --name ollama \

  ollama/ollama:0.21.0

~~~

The container starts an API server, but it doesn't come with any LLMs pre-installed. 

 

Pull and Run a Model

You need to "exec" into the container to pull a model (like Llama 4 ).

 

# into the container
docker exec -it ollama sh

# Download a model without running it
ollama pull [model-name]

# Run a Model
ollama run gemma4:e4b

 

Ollama will automatically download the model.

 

Verify

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

 

 

# ollama list
NAME             ID              SIZE      MODIFIED          
gemma4:e2b       7fbdbf8f5e45    7.2 GB    15 minutes ago       
gemma4:e4b       c6eb396dbd59    9.6 GB    About an hour ago    
gemma3:4b        a2af6cc3eb7f    3.3 GB    2 hours ago          
llama3:latest    365c0bd3c000    4.7 GB    3 hours ago 

 

 

Quick Diagnostic Steps

docker logs -f ollama

 

 

Useful links

 

Recommended Models of  ollama

https://ollama.com/library

 

gemma4 Model information

https://ollama.com/library/gemma4:e4b

model requires more system memory (9.8 GiB) than is available (4.7 GiB)

Comments

Be the first to post a comment

Post a comment