How to Run AI Models Locally with Ollama

Published Apr 17, 2026

In this article, you’ll learn how to set up your own local generative AI using existing models such as Gemma 4 and Meta’s LLaMA 3.

The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) ollama/ollama is available on Docker Hub.

To run the Gemma 4 model locally using Ollama:

First of all, we need an LLM Serving Engine, such as Ollama: A framework for running large language models locally.

Pull the LLM models via Ollama

Load the LLM models via Ollama

Test the LLM models via Ollama CLI

Install Ollama with Docker
Pull and Run a Model via ollama
Test the LLM models via Ollama CLI
Configuration Checklist

Install Ollama with Docker

There are several ways to install it on your machine,we will Running ollama via Docker.

Download

docker pull ollama/ollama:0.21.0

Version control:

Google officially released the Gemma 4 family on April 2, 2026, and ollama latest version has stabilized support for its unique architectures.

## Run the Container

You need to decide if you want to use CPU-only or GPU acceleration and specified a specific version.

CPU Only

~~~

docker run -d \

  -v ollama:/root/.ollama \

  -p 11434:11434 \

  --name ollama \

  ollama/ollama:0.21.0

~~~

The container starts an API server, but it doesn't come with any LLMs pre-installed.

Pull and Run a Model via ollama

You need to "exec" into the container to pull a model (like Llama 4 ).

# into the container
docker exec -it ollama sh

# Download a model without running it
ollama pull [model-name]

# Run a Model
# A single command (ollama run gemma4:e4b) handles downloading, memory management, and API serving.
ollama run gemma4:e4b

Ollama will automatically download the model.llm server loading model.

Once the model is loaded, you can see generating output:

output

# ollama run gemma4:e2b
>>>

docker logs -f ollama

output

time=2026-04-18T09:18:54.786Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
...
time=2026-04-18T09:20:29.899Z level=INFO source=server.go:1402 msg="llama runner started in 97.04 seconds"
[GIN] 2026/04/18 - 09:20:32 | 200 |         1m42s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2026/04/18 - 09:22:32 | 200 | 39.184533484s |       127.0.0.1 | POST     "/api/chat"

Test the LLM models via Ollama CLI

start a chat -Test the LLM models

Verify - cli(Test in CLI)

# ollama run gemma4:e2b
>>> how
Thinking...
Thinking Process:

1.  **Analyze the Input:** The input is "how".
...

Verify - api test

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

output

root@debian:~# curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e2b",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'
{"model":"gemma4:e2b","created_at":"2026-04-18T09:24:45.242321396Z","message":{"role":"assistant","content":"The reason the sky appears blue is due to a phenomenon called **Rayleigh Scattering**. It is a result of how sunlight interacts with the small molecules of the Earth's atmosphere.\n\nHere is a detailed breakdown of the process:\n\n---\n\n### 1. The Ingredients: Sunlight and Atmosphere\n\n**A. Sunlight is White Light:**\nSunlight, which appears white to us, is actu
...

Verify - Web UI(Browser test)

open-webui

https://github.com/open-webui/open-webui

# ollama list
NAME             ID              SIZE      MODIFIED          
gemma4:e2b       7fbdbf8f5e45    7.2 GB    15 minutes ago       
gemma4:e4b       c6eb396dbd59    9.6 GB    About an hour ago    
gemma3:4b        a2af6cc3eb7f    3.3 GB    2 hours ago          
llama3:latest    365c0bd3c000    4.7 GB    3 hours ago

Quick Diagnostic Steps

docker logs -f ollama

Configuration Checklist

OS: 64-bit Debian12 OS

RAM:16G

CPU:Intel Cpu 10400

Larger Models:Gemma 4 E2B (4-bit)

Useful links

Recommended Models of ollama

https://ollama.com/library

gemma4 Model information

https://ollama.com/library/gemma4:e4b

Gemma 4 Inference Memory Requirements

https://ai.google.dev/gemma/docs/core#gemma-4-inference-memory-requirements

Parameters	BF16 (16-bit)	SFP8 (8-bit)	Q4_0 (4-bit)
Gemma 4 E2B	9.6 GB	4.6 GB	3.2 GB
Gemma 4 E4B	15 GB	7.5 GB	5 GB
Gemma 4 31B	58.3 GB	30.4 GB	17.4 GB
Gemma 4 26B A4B	48 GB	25 GB	15.6 GB

model requires more system memory (9.8 GiB) than is available (4.7 GiB)

blog

https://medium.com/tech-ai-chat/running-llm-on-a-local-mac-machine-0dae23d8320b

How to Run AI Models Locally with Ollama

Table of Contents

Install Ollama with Docker

Pull and Run a Model via ollama

Test the LLM models via Ollama CLI

Configuration Checklist

Comments