How to Run AI Models Locally with Ollama
In this article, you’ll learn how to set up your own local generative AI using existing models such as Gemma 4 and Meta’s LLaMA 3.
The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) ollama/ollama is available on Docker Hub.
To run the Gemma 4 model locally using Ollama:
First of all, we need an LLM Serving Engine, such as Ollama
load the LLM models via Ollama
Download
docker pull ollama/ollama:0.21.0

## Run the Container
You need to decide if you want to use CPU-only or GPU acceleration.
CPU Only
~~~
docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama:0.21.0
~~~
The container starts an API server, but it doesn't come with any LLMs pre-installed.
Pull and Run a Model
You need to "exec" into the container to pull a model (like Llama 4 ).
# into the container
docker exec -it ollama sh
# Download a model without running it
ollama pull [model-name]
# Run a Model
ollama run gemma4:e4b
Ollama will automatically download the model.
Verify
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [{
"role": "user",
"content": "Why is the sky blue?"
}],
"stream": false
}'
# ollama list
NAME ID SIZE MODIFIED
gemma4:e2b 7fbdbf8f5e45 7.2 GB 15 minutes ago
gemma4:e4b c6eb396dbd59 9.6 GB About an hour ago
gemma3:4b a2af6cc3eb7f 3.3 GB 2 hours ago
llama3:latest 365c0bd3c000 4.7 GB 3 hours ago
Quick Diagnostic Steps
docker logs -f ollama
Useful links
Recommended Models of ollama
https://ollama.com/library
gemma4 Model information
https://ollama.com/library/gemma4:e4b
model requires more system memory (9.8 GiB) than is available (4.7 GiB)
Comments
Be the first to post a comment