Running Gemma 4 within a Dockerized llama.cpp environment for Home Assistant.

Use the server-arm64 image which is optimized for the Pi's CPU.

Table of Contents

Prerequisites

Hardware: Raspberry Pi 5 (8GB RAM highly recommended).

OS: Raspberry Pi OS (64-bit) or Ubuntu (64-bit).

Storage: At least 5GB free space (preferably on an SSD/NVMe for speed).

 

 

Device: Raspberry Pi 5 (8GB)

OS: Debian 12

Runtime: Docker

Engine: llama.cpp

Model: Gemma 4 E2B (GGUF, quantized)

 

Install Docker (Debian 12)

 

 

Pull llama.cpp (light)

# This is the correct lightweight image for Pi (ARM)
docker pull ghcr.io/ggml-org/llama.cpp:light

Pick a model - Download Gemma (GGUF, quantized)

# llama.cpp only works with GGUF

# Create model directory
mkdir -p /datadocker/llama-cpp
cd /datadocker/llama-cpp/models


https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF/tree/main

https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/tree/main

google_gemma-4-E2B-it-Q4_0.gguf
https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/resolve/main/google_gemma-4-E2B-it-Q4_0.gguf?download=true

Note

There is no official “Gemma 4 E2B GGUF direct URL” from Google.

GGUF files are community-converted and hosted on Hugging Face.

 

 

Run llama.cpp server (Docker)

docker run -v /path/to/models:/models --entrypoint /app/llama-cli ghcr.io/ggml-org/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf

run ok

docker run -it --rm \
  -v /datadocker/llama-cpp/models:/models \
  --entrypoint /app/llama-cli \
  -p 8091:8080 \
  ghcr.io/ggml-org/llama.cpp:light \
  -m /models/gemma-4-e2b-it-Q4_K_M.gguf \

 

output

load_backend: loaded CPU backend from /app/libggml-cpu-armv8.2_2.so

Loading model...

▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8833-45cac7ca7
model      : google_gemma-4-E2B-it-Q4_0.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> 

 

test 

> how

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user provided a single word: "how". This is an extremely open-ended prompt. It implies a question, a request for a process, or a search for a method.
2.  **Determine the Context:** Since there is no context, I must provide a response that addresses the ambiguity while offering ways to get a useful answer.
3.  **Identify the Goal:** The goal is to prompt the user to specify what they want to know "how" to do.
4.  **Brainstorm Potential Interpretations of "how":**
    *   *How does X work?* (Mechanism, process)
    *   *How do I do Y?* (Instructions, tutorial)
    *   *How to achieve Z?* (Strategy, steps)
    *   *How are we connected?* (Relationship, system)
5.  **Formulate the Response Strategy:**
    *   Acknowledge the brevity.
    *   State that more information is needed.
    *   Provide examples of the types of questions I can answer.
    *   Invite the user to ask the specific question.
6.  **Draft the Response (Self-Correction/Refinement):** Start with a friendly, open invitation. Ensure the tone is helpful and encouraging. (The resulting response should be a clear call to action.)
[End thinking]

Please tell me what you would like to know **how** to do! 😊

I can help you with instructions, explanations, processes, recipes, coding, concepts, and much more.

**For example, you could ask:**

* "How do I bake a cake?"
* "How does photosynthesis work?"
* "How do I change the font in Microsoft Word?"
* "How do I start learning Spanish?"

**Just tell me your question!**

[ Prompt: 8.6 t/s | Generation: 5.6 t/s ]

> 

 

 

Useful links

llama.cpp on GitHub with Docker image

https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md

 

Models

https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/tree/main

https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/blob/main/google_gemma-4-E2B-it-Q4_0.gguf

https://huggingface.co/ggml-org/gemma-4-E2B-it-GGUF

Comments

Be the first to post a comment

Post a comment