How to Run AI Models Locally with Ollama

Published Apr 17, 2026

In this article, you’ll learn how to set up your own local generative AI using existing models such as Gemma 4 and Meta’s LLaMA 3.

The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) ollama/ollama is available on Docker Hub.

To run the Gemma 4 model locally using Ollama:

First of all, we need an LLM Serving Engine, such as Ollama

load the LLM models via Ollama

Download

docker pull ollama/ollama:0.21.0

## Run the Container

You need to decide if you want to use CPU-only or GPU acceleration.

CPU Only

~~~

docker run -d \

-v ollama:/root/.ollama \

-p 11434:11434 \

--name ollama \

ollama/ollama:0.21.0

~~~

The container starts an API server, but it doesn't come with any LLMs pre-installed.

Pull and Run a Model

You need to "exec" into the container to pull a model (like Llama 4 ).

# into the container
docker exec -it ollama sh

# Download a model without running it
ollama pull [model-name]

# Run a Model
ollama run gemma4:e4b

Ollama will automatically download the model.

Verify

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

# ollama list
NAME             ID              SIZE      MODIFIED          
gemma4:e2b       7fbdbf8f5e45    7.2 GB    15 minutes ago       
gemma4:e4b       c6eb396dbd59    9.6 GB    About an hour ago    
gemma3:4b        a2af6cc3eb7f    3.3 GB    2 hours ago          
llama3:latest    365c0bd3c000    4.7 GB    3 hours ago

Quick Diagnostic Steps

docker logs -f ollama

Useful links

Recommended Models of ollama

https://ollama.com/library

gemma4 Model information

https://ollama.com/library/gemma4:e4b

model requires more system memory (9.8 GiB) than is available (4.7 GiB)

How to Run AI Models Locally with Ollama

Comments

Post a comment