<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>blog.matterxiaomi.com</title>
  <id>http://blog.matterxiaomi.com/</id>
  <subtitle>This site is all about blog.matterxiaomi.com.</subtitle>
  <generator uri="https://github.com/madskristensen/Miniblog.Core" version="1.0">Miniblog.Core</generator>
  <updated>2026-04-21T21:33:31Z</updated>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/run-local-LLM-server-part5/</id>
    <title>how to run locally llama.cpp for home assistant on rpi5</title>
    <updated>2026-04-25T14:22:45Z</updated>
    <published>2026-04-21T21:33:31Z</published>
    <link href="http://blog.matterxiaomi.com/blog/run-local-LLM-server-part5/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <category term="ai" />
    <category term="llm" />
    <content type="html">&lt;p&gt;To run llama.cpp locally for Home Assistant, you must host a llama.cpp server that provides an API， that Home Assistant can communicate with via an API.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Home Assistant does not have a "llama.cpp" brand integration by default.&lt;/p&gt;
&lt;p&gt;connect Home Assistant to it using a compatible integration. such as&amp;nbsp;https://github.com/skye-harris/hass_local_openai_llm.&lt;/p&gt;
&lt;div class="mce-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmtliqsh1"&gt;Run llama-server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmtllr513"&gt;Connect to Home Assistant&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmro8d8sf"&gt;Integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmro8d8sg"&gt;Voice assistant&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Device: Raspberry Pi 5 (8GB)&lt;/p&gt;
&lt;p&gt;OS: Debian 12&lt;/p&gt;
&lt;p&gt;Runtime: Docker&lt;/p&gt;
&lt;p&gt;Engine: &lt;span style="white-space: normal;"&gt;llama-server&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Model: Gemma 4 E2B (GGUF, quantized)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;In the ghcr.io/ggml-org/llama.cpp repository, the images are split by purpose:&lt;/p&gt;
&lt;table data-path-to-node="3"&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;span data-path-to-node="3,0,0,0"&gt;Tag&lt;/span&gt;&lt;/th&gt;
&lt;th&gt;&lt;span data-path-to-node="3,0,1,0"&gt;Primary Contents&lt;/span&gt;&lt;/th&gt;
&lt;th&gt;&lt;span data-path-to-node="3,0,2,0"&gt;Best Use Case&lt;/span&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span data-path-to-node="3,1,0,0"&gt;&lt;strong data-path-to-node="3,1,0,0" data-index-in-node="0"&gt;&lt;code data-path-to-node="3,1,0,0" data-index-in-node="0"&gt;:light&lt;/code&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span data-path-to-node="3,1,1,0"&gt;&lt;code data-path-to-node="3,1,1,0" data-index-in-node="0"&gt;llama-cli&lt;/code&gt;, &lt;code data-path-to-node="3,1,1,0" data-index-in-node="11"&gt;llama-completion&lt;/code&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span data-path-to-node="3,1,2,0"&gt;&lt;strong data-path-to-node="3,1,2,0" data-index-in-node="0"&gt;Testing/CLI:&lt;/strong&gt; Best for running models in the terminal or one-off completions without overhead.&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span data-path-to-node="3,2,0,0"&gt;&lt;strong data-path-to-node="3,2,0,0" data-index-in-node="0"&gt;&lt;code data-path-to-node="3,2,0,0" data-index-in-node="0"&gt;:server&lt;/code&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span data-path-to-node="3,2,1,0"&gt;&lt;code data-path-to-node="3,2,1,0" data-index-in-node="0"&gt;llama-server&lt;/code&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span data-path-to-node="3,2,2,0"&gt;&lt;strong data-path-to-node="3,2,2,0" data-index-in-node="0"&gt;Production/API:&lt;/strong&gt; Ideal for your Home Assistant setup. It provides the OpenAI-compatible endpoint.&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span data-path-to-node="3,3,0,0"&gt;&lt;strong data-path-to-node="3,3,0,0" data-index-in-node="0"&gt;&lt;code data-path-to-node="3,3,0,0" data-index-in-node="0"&gt;:full&lt;/code&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span data-path-to-node="3,3,1,0"&gt;CLI, Server, &lt;strong data-path-to-node="3,3,1,0" data-index-in-node="13"&gt;and&lt;/strong&gt; Python conversion/quantization tools.&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span data-path-to-node="3,3,2,0"&gt;&lt;strong data-path-to-node="3,3,2,0" data-index-in-node="0"&gt;Development:&lt;/strong&gt; Use this if you need to convert &lt;code data-path-to-node="3,3,2,0" data-index-in-node="45"&gt;.safetensors&lt;/code&gt; to &lt;code data-path-to-node="3,3,2,0" data-index-in-node="61"&gt;.gguf&lt;/code&gt; or quantize a model yourself.&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;:light: Contains only llama-cli and llama-completion.It does not contain the API server.&lt;/p&gt;
&lt;p&gt;:server: Contains only llama-server.It contain the API server not contain&amp;nbsp;llama-cli.A lightweight, OpenAI API compatible, HTTP server for serving LLMs.&lt;/p&gt;
&lt;p&gt;:full: Contains everything.&lt;/p&gt;
&lt;p&gt;You should use the :server tag (or better yet, the :server-arm64 tag since you are on a Raspberry Pi 5).&lt;/p&gt;
&lt;h2 id="mcetoc_1jmtliqsh1"&gt;Run llama-server&lt;/h2&gt;
&lt;p&gt;The llama-server executable acts as an OpenAI-compatible API that Home Assistant can use.&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;docker run -it --rm \
  --name llama \
  -v /datadocker/llama-cpp/models:/models \
  -p 8091:8080 \
  ghcr.io/ggml-org/llama.cpp:server \
  -m /models/google_gemma-4-E2B-it-Q4_0.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  --threads 4 \
  --jinja
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note&lt;/p&gt;
&lt;p&gt;&amp;nbsp;--entrypoint /app/llama-cli: processes your input (or waits for one), and then exits. It does not listen for network requests on a port.&lt;/p&gt;
&lt;p&gt;--entrypoint /app/llama-server: You were using llama-cli, which is for one-off prompts in the terminal. The llama-server is required to handle API calls like curl.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;--host 0.0.0.0: Inside a Docker container, the server must listen on 0.0.0.0 to accept connections from your Raspberry Pi's IP or localhost.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;--port 8080: This tells the software inside the container to listen on port 8080 (which you mapped to 8091 on your host).&lt;/p&gt;
&lt;p&gt;--jinja:&amp;nbsp;support for OpenAI-style function calling.Tool calling must be enabled in&amp;nbsp; inference engine.&lt;a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/function-calling.md"&gt;Detail&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;...
srv          init: init: chat template, thinking = 1
main: model loaded
main: server is listening on http://0.0.0.0:8080
main: starting the main loop...
srv  update_slots: all slots are idle
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now llama.cpp = local LLM &amp;rarr; HTTP API server&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Test&lt;/p&gt;
&lt;p&gt;Once the server logs show "HTTP server listening", run your curl command. Make sure to include a JSON body, otherwise the server might reject the request:&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;curl http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello Gemma!"}]
  }'&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 id="mcetoc_1jmtllr513"&gt;Connect to Home Assistant&lt;/h2&gt;
&lt;h3 id="mcetoc_1jmro8d8sf"&gt;Integration -&amp;nbsp;Add Integration&lt;/h3&gt;
&lt;p&gt;Custom Integration - Local OpenAI LLM Integration&lt;/p&gt;
&lt;p&gt;https://github.com/skye-harris/hass_local_openai_llm&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Custom Integration - Configure Integration&lt;/p&gt;
&lt;p&gt;Added server URL to the initial server configuration&lt;/p&gt;
&lt;p&gt;http://192.168.2.125:8091&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jmro8d8sg"&gt;Voice assistant&amp;nbsp;&lt;span style="font-size: 14px;"&gt;- Create&amp;nbsp; conversation agent&lt;/span&gt;&lt;/h3&gt;
&lt;p&gt;Add assistant&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/run-local-LLM-server-part3/</id>
    <title>How to Run AI Models Locally with llama.cpp on rpi5</title>
    <updated>2026-04-23T17:42:13Z</updated>
    <published>2026-04-18T22:52:25Z</published>
    <link href="http://blog.matterxiaomi.com/blog/run-local-LLM-server-part3/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <category term="ai" />
    <category term="llm" />
    <content type="html">&lt;p&gt;Most people access generative AI tools like ChatGPT or Gemini through a web interface or API &amp;mdash; but what if you could run them locally?&lt;/p&gt;
&lt;p&gt;In this article, you&amp;rsquo;ll learn how to set up your own local generative AI using existing models such as llama.cpp.&lt;/p&gt;
&lt;p&gt;The final result will look like the GIF shown below (note, it&amp;rsquo;s hosted localhost)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;div class="mce-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmhd01hc1"&gt;Prerequisites&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmoqdr6vc"&gt;step 1. Pull llama.cpp (light)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmoqdr6vd"&gt;step 2.&amp;nbsp;Pick a model - Download Gemma (GGUF, quantized)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmoqdr6ve"&gt;step 3.&amp;nbsp; Docker run and load model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmoqfbh0g"&gt;step 4.&amp;nbsp;test&amp;nbsp;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id="mcetoc_1jmhd01hc1"&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;Hardware: Raspberry Pi 5 (8GB RAM highly recommended).&lt;/p&gt;
&lt;p&gt;OS: Raspberry Pi OS (64-bit) or Ubuntu (64-bit).&lt;/p&gt;
&lt;p&gt;Storage: At least 5GB free space (preferably on an SSD/NVMe for speed).&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Device: Raspberry Pi 5 (8GB)&lt;/p&gt;
&lt;p&gt;OS: Debian 12&lt;/p&gt;
&lt;p&gt;Runtime: Docker&lt;/p&gt;
&lt;p&gt;Engine: llama.cpp&lt;/p&gt;
&lt;p&gt;Model: Gemma 4 E2B (GGUF, quantized)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Install Docker (Debian 12)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Here are the commands I used to Got Gemma 4 E2B running on a Raspberry Pi 5 8GB:&lt;/p&gt;
&lt;h3 id="mcetoc_1jmoqdr6vc"&gt;step 1. Pull llama.cpp (light)&lt;/h3&gt;
&lt;p&gt;First of all, we need an LLM Serving Engine, such as llama.cpp.&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;# This is the correct lightweight image for Pi (ARM)
docker pull ghcr.io/ggml-org/llama.cpp:light&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id="mcetoc_1jmoqdr6vd"&gt;step 2.&amp;nbsp;Pick a model - Download Gemma (GGUF, quantized)&lt;/h3&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;# llama.cpp only works with GGUF

# Create model directory
mkdir -p /datadocker/llama-cpp
cd /datadocker/llama-cpp/models


https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF/tree/main

https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/tree/main

google_gemma-4-E2B-it-Q4_0.gguf
https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/resolve/main/google_gemma-4-E2B-it-Q4_0.gguf?download=true&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note&lt;/p&gt;
&lt;p&gt;There is no official &amp;ldquo;Gemma 4 E2B GGUF direct URL&amp;rdquo; from Google.&lt;/p&gt;
&lt;p&gt;GGUF files are community-converted and hosted on Hugging Face.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jmoqdr6ve"&gt;step 3.&amp;nbsp; Docker run and load model&lt;/h3&gt;
&lt;p&gt;Run llama.cpp server (Docker)&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;docker run -v /path/to/models:/models --entrypoint /app/llama-cli ghcr.io/ggml-org/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;run ok&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;docker run -it --rm \
  -v /datadocker/llama-cpp/models:/models \
  --entrypoint /app/llama-cli \
  -p 8091:8080 \
  ghcr.io/ggml-org/llama.cpp:light \
  -m /models/gemma-4-e2b-it-Q4_K_M.gguf \&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note&lt;/p&gt;
&lt;p&gt;1.pick the model you downloaded earlier.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;load_backend: loaded CPU backend from /app/libggml-cpu-armv8.2_2.so

Loading model...

▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8833-45cac7ca7
model      : google_gemma-4-E2B-it-Q4_0.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read &amp;lt;file&amp;gt;        add a text file
  /glob &amp;lt;pattern&amp;gt;     add text files using globbing pattern


&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jmoqfbh0g"&gt;step 4.&amp;nbsp;test&amp;nbsp;&lt;/h3&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;&amp;gt; how

[Start thinking]
Thinking Process:

1.  **Analyze the Request:** The user provided a single word: "how". This is an extremely open-ended prompt. It implies a question, a request for a process, or a search for a method.
2.  **Determine the Context:** Since there is no context, I must provide a response that addresses the ambiguity while offering ways to get a useful answer.
3.  **Identify the Goal:** The goal is to prompt the user to specify what they want to know "how" to do.
4.  **Brainstorm Potential Interpretations of "how":**
    *   *How does X work?* (Mechanism, process)
    *   *How do I do Y?* (Instructions, tutorial)
    *   *How to achieve Z?* (Strategy, steps)
    *   *How are we connected?* (Relationship, system)
5.  **Formulate the Response Strategy:**
    *   Acknowledge the brevity.
    *   State that more information is needed.
    *   Provide examples of the types of questions I can answer.
    *   Invite the user to ask the specific question.
6.  **Draft the Response (Self-Correction/Refinement):** Start with a friendly, open invitation. Ensure the tone is helpful and encouraging. (The resulting response should be a clear call to action.)
[End thinking]

Please tell me what you would like to know **how** to do! 😊

I can help you with instructions, explanations, processes, recipes, coding, concepts, and much more.

**For example, you could ask:**

* "How do I bake a cake?"
* "How does photosynthesis work?"
* "How do I change the font in Microsoft Word?"
* "How do I start learning Spanish?"

**Just tell me your question!**

[ Prompt: 8.6 t/s | Generation: 5.6 t/s ]

&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;access generative AI tools like llamacp through a web interface or API&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Useful links&lt;/p&gt;
&lt;p&gt;llama.cpp on GitHub with Docker&amp;nbsp;image&lt;/p&gt;
&lt;p&gt;https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Models download via&amp;nbsp; url&lt;/p&gt;
&lt;p&gt;https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/tree/main&lt;/p&gt;
&lt;p&gt;https://huggingface.co/bartowski/google_gemma-4-E2B-it-GGUF/blob/main/google_gemma-4-E2B-it-Q4_0.gguf&lt;/p&gt;
&lt;p&gt;https://huggingface.co/ggml-org/gemma-4-E2B-it-GGUF&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/run-local-LLM-server-part2/</id>
    <title>How to Run AI Models Locally with Ollama</title>
    <updated>2026-04-21T20:42:51Z</updated>
    <published>2026-04-17T19:11:15Z</published>
    <link href="http://blog.matterxiaomi.com/blog/run-local-LLM-server-part2/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <category term="ai" />
    <category term="llm" />
    <content type="html">&lt;p&gt;In this article, you&amp;rsquo;ll learn how to set up your own local generative AI using existing models such as Gemma 4 and Meta&amp;rsquo;s LLaMA 3.&lt;/p&gt;
&lt;p&gt;The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) ollama/ollama is available on Docker Hub.&lt;/p&gt;
&lt;p&gt;To run the Gemma 4 model locally using Ollama:&lt;/p&gt;
&lt;p&gt;First of all, we need an LLM Serving Engine, such as Ollama: A framework for running large language models locally.&lt;/p&gt;
&lt;p&gt;Pull the LLM models via&amp;nbsp;Ollama&lt;/p&gt;
&lt;p&gt;Load the LLM models via&amp;nbsp;Ollama&lt;/p&gt;
&lt;p&gt;Test&amp;nbsp; the LLM models via&amp;nbsp;Ollama CLI&lt;/p&gt;
&lt;div class="mce-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmgfi3k41"&gt;Install Ollama with Docker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmgfjbqc3"&gt;Pull and Run a Model via ollama&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmghk0ah1"&gt;Test&amp;nbsp; the LLM models via Ollama CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmgltup51"&gt;Configuration Checklist&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id="mcetoc_1jmgfi3k41"&gt;Install Ollama with Docker&lt;/h2&gt;
&lt;p&gt;There are several ways to install it on your machine,we will&amp;nbsp;Running ollama&amp;nbsp;via Docker.&lt;/p&gt;
&lt;p&gt;Download&lt;/p&gt;
&lt;p&gt;docker pull&amp;nbsp; ollama/ollama:0.21.0&lt;/p&gt;
&lt;p&gt;&lt;img src="/Posts/files/ollama-1_639120498754095416.jpg" alt="ollama-1.jpg" width="789" height="342" /&gt;&lt;/p&gt;
&lt;p&gt;Version control:&lt;/p&gt;
&lt;p&gt;Google officially released the Gemma 4 family on April 2, 2026, and ollama latest version&amp;nbsp;has stabilized support for its unique architectures.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;## Run the Container&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;You need to decide if you want to use CPU-only or GPU acceleration and specified a specific version.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;CPU Only&lt;/p&gt;
&lt;p&gt;~~~&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;docker run -d \

  -v ollama:/root/.ollama \

  -p 11434:11434 \

  --name ollama \

  ollama/ollama:0.21.0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;~~~&lt;/p&gt;
&lt;p&gt;The container starts an API server, but it doesn't come with any LLMs pre-installed.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jmgfjbqc3"&gt;Pull and Run a Model via ollama&lt;/h2&gt;
&lt;p&gt;You need to "exec" into the container to pull a model (like Llama 4 ).&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;# into the container
docker exec -it ollama sh

# Download a model without running it
ollama pull [model-name]

# Run a Model
# A single command (ollama run gemma4:e4b) handles downloading, memory management, and API serving.
ollama run gemma4:e4b&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Ollama will automatically download the model.llm server loading model.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Once the model is loaded, you can see generating output:&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;# ollama run gemma4:e2b
&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;docker logs -f ollama&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;time=2026-04-18T09:18:54.786Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
...
time=2026-04-18T09:20:29.899Z level=INFO source=server.go:1402 msg="llama runner started in 97.04 seconds"
[GIN] 2026/04/18 - 09:20:32 | 200 |         1m42s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2026/04/18 - 09:22:32 | 200 | 39.184533484s |       127.0.0.1 | POST     "/api/chat"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jmghk0ah1"&gt;Test&amp;nbsp; the LLM models via Ollama CLI&lt;/h2&gt;
&lt;p&gt;start a chat -Test&amp;nbsp; the LLM models&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Verify - cli(Test in CLI)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;# ollama run gemma4:e2b
&amp;gt;&amp;gt;&amp;gt; how
Thinking...
Thinking Process:

1.  **Analyze the Input:** The input is "how".
...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Verify - api test&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;root@debian:~# curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e2b",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'
{"model":"gemma4:e2b","created_at":"2026-04-18T09:24:45.242321396Z","message":{"role":"assistant","content":"The reason the sky appears blue is due to a phenomenon called **Rayleigh Scattering**. It is a result of how sunlight interacts with the small molecules of the Earth's atmosphere.\n\nHere is a detailed breakdown of the process:\n\n---\n\n### 1. The Ingredients: Sunlight and Atmosphere\n\n**A. Sunlight is White Light:**\nSunlight, which appears white to us, is actu
...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Verify - Web UI(Browser test)&lt;/p&gt;
&lt;p&gt;open-webui&lt;/p&gt;
&lt;p&gt;https://github.com/open-webui/open-webui&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;# ollama list
NAME             ID              SIZE      MODIFIED          
gemma4:e2b       7fbdbf8f5e45    7.2 GB    15 minutes ago       
gemma4:e4b       c6eb396dbd59    9.6 GB    About an hour ago    
gemma3:4b        a2af6cc3eb7f    3.3 GB    2 hours ago          
llama3:latest    365c0bd3c000    4.7 GB    3 hours ago &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Quick Diagnostic Steps&lt;/p&gt;
&lt;p&gt;docker logs -f ollama&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jmgltup51"&gt;Configuration Checklist&lt;/h2&gt;
&lt;p&gt;OS: 64-bit Debian12 OS&lt;/p&gt;
&lt;p&gt;RAM:16G&lt;/p&gt;
&lt;p&gt;CPU:Intel Cpu 10400&lt;/p&gt;
&lt;p&gt;Larger Models:Gemma 4 E2B (4-bit)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Useful links&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Recommended Models of&amp;nbsp; ollama&lt;/p&gt;
&lt;p&gt;https://ollama.com/library&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;gemma4 Model information&lt;/p&gt;
&lt;p&gt;https://ollama.com/library/gemma4:e4b&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Gemma 4 Inference Memory Requirements&lt;/p&gt;
&lt;p&gt;https://ai.google.dev/gemma/docs/core#gemma-4-inference-memory-requirements&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;BF16 (16-bit)&lt;/th&gt;
&lt;th&gt;SFP8 (8-bit)&lt;/th&gt;
&lt;th&gt;Q4_0 (4-bit)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 E2B&lt;/td&gt;
&lt;td&gt;9.6 GB&lt;/td&gt;
&lt;td&gt;4.6 GB&lt;/td&gt;
&lt;td&gt;3.2 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 E4B&lt;/td&gt;
&lt;td&gt;15 GB&lt;/td&gt;
&lt;td&gt;7.5 GB&lt;/td&gt;
&lt;td&gt;5 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 31B&lt;/td&gt;
&lt;td&gt;58.3 GB&lt;/td&gt;
&lt;td&gt;30.4 GB&lt;/td&gt;
&lt;td&gt;17.4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 26B A4B&lt;/td&gt;
&lt;td&gt;48 GB&lt;/td&gt;
&lt;td&gt;25 GB&lt;/td&gt;
&lt;td&gt;15.6 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;model requires more system memory (9.8 GiB) than is available (4.7 GiB)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;blog&lt;/p&gt;
&lt;p&gt;https://medium.com/tech-ai-chat/running-llm-on-a-local-mac-machine-0dae23d8320b&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/run-local-LLM-server-part1/</id>
    <title>Ollama vs LiteLLM vs llmama.cpp vs vvllm vs lm studio</title>
    <updated>2026-04-19T18:27:39Z</updated>
    <published>2026-04-05T03:15:00Z</published>
    <link href="http://blog.matterxiaomi.com/blog/run-local-LLM-server-part1/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <category term="ai" />
    <category term="llm" />
    <content type="html">&lt;p&gt;How to run a local LLM server step by step&lt;/p&gt;
&lt;p&gt;Ollama vs LiteLLM vs llmama.cpp vs vvllm vs lm studio&lt;/p&gt;
&lt;p&gt;These tools represent different layers of the AI stack. While they overlap, they generally serve distinct purposes:&lt;/p&gt;
&lt;p&gt;Serving (Llama.cpp, vLLM),&lt;/p&gt;
&lt;p&gt;Managing (Ollama, LM Studio),&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Routing (LiteLLM).&lt;/p&gt;
&lt;div class="mce-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmgik5tof"&gt;Managing (Ollama, LM Studio)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmgik5tog"&gt;Serving (Llama.cpp, vLLM)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jmgik5toh"&gt;Routing (LiteLLM)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jmgik5tof"&gt;Managing (Ollama, LM Studio)&lt;/h2&gt;
&lt;p&gt;Ollama&lt;/p&gt;
&lt;p&gt;A local LLM inference/runtime platform.It handles model downloads, storage, and execution with a simple CLI/API. Think of it as a &amp;ldquo;local LLM server&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Run AI Models locally&amp;nbsp; integrate via API&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;LM Studio&lt;/p&gt;
&lt;p&gt;A&amp;nbsp; desktop application.&lt;/p&gt;
&lt;p&gt;Run AI Models locally with a Chat UI&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jmgik5tog"&gt;Serving (Llama.cpp, vLLM)&lt;/h2&gt;
&lt;p&gt;llama.cpp - run ai model on edge devices.&lt;/p&gt;
&lt;p&gt;Run a model on a Raspberry Pi.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;vLLM&lt;/p&gt;
&lt;p&gt;Build a high-traffic AI startup or production API.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jmgik5toh"&gt;Routing (LiteLLM)&lt;/h2&gt;
&lt;p&gt;LiteLLM&lt;/p&gt;
&lt;p&gt;LiteLLM is not an inference engine; it is a Proxy/Router.A proxy/gateway layer that provides a unified, OpenAI-compatible API for calling many LLM providers (cloud and local).&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/matter-bridge-part2/</id>
    <title>Matter Bridge in Home Assistant Part2 - Install MatterBridge Connect to Home assistant</title>
    <updated>2026-02-16T23:14:37Z</updated>
    <published>2026-02-15T00:02:19Z</published>
    <link href="http://blog.matterxiaomi.com/blog/matter-bridge-part2/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <category term="matter bridge" />
    <category term="matter bridge" />
    <content type="html">&lt;p&gt;Matter Bridge in Home Assistant Part2 - Install MatterBridge Connect to Home assistant&lt;/p&gt;
&lt;div class="mce-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jhfa2nbb8"&gt;Quick start&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jhfadikr2"&gt;Install and configure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jhfa2nbb9"&gt;How to Use&amp;nbsp;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id="mcetoc_1jhfa2nbb8"&gt;Quick start&lt;/h2&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;I set up the matterbridge as follows:&lt;/p&gt;
&lt;p&gt;Install the Matterbridge docker&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Create long-lived access tokens to allow home-assistant-matter-hub docker to interact with your Home Assistant instance.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Communication configure between Matterhub and Home Assistant,Matterbridge to connect to home assistant with url and token&lt;/p&gt;
&lt;p&gt;expose&amp;nbsp; homeassistant device as a matter bridge&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;open http://192.168.2.125:8482/ via chrome browser

Create a new bridge,

Add device "pattern: switch.air_con" in new bridge

start it to generate a pairing QR code

Connect accessory to Apple Home&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jhfadikr2"&gt;Install and configure&lt;/h2&gt;
&lt;p&gt;docker-compose.yml&lt;/p&gt;
&lt;p&gt;You need to create an access token in home assistant&amp;nbsp;instance&amp;nbsp;and export it like this:&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;services:
  matter-hub:
    image: ghcr.io/t0bst4r/home-assistant-matter-hub:3.0.1
    restart: unless-stopped
    network_mode: host
    environment: # more options can be found in the configuration section
      - HAMH_HOME_ASSISTANT_URL=http://192.168.2.125:8123/
      - HAMH_HOME_ASSISTANT_ACCESS_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJhMzcwZDExYjM4MjE0YzFmYThmZTk3NDZjMDQyODU2NSIsImlhdCI6MTc3MTA4MTk1NSwiZXhwIjoyMDg2NDQxOTU1fQ.vhZD-KhJe4XIXd6_XvBE92y4T5W1aICSfBCbTTCvFL4
      - HAMH_LOG_LEVEL=info
      - HAMH_HTTP_PORT=8482
    volumes:
      - /datadocker/home-assistant-matter-hub:/data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Now you can visit it via web ui&lt;/p&gt;
&lt;p&gt;http://192.168.2.125:8482/&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jhfa2nbb9"&gt;How to Use&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;expose&amp;nbsp; homeassistant device as a matter bridge&lt;/p&gt;
&lt;p&gt;open http://192.168.2.125:8482/&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Create a new bridge for device,home-assistant-matter-hub docker get it&amp;nbsp;from home assistant via api.&lt;/p&gt;
&lt;p&gt;type:pattern&lt;/p&gt;
&lt;p&gt;value:light.yeelink_cn_ceiling21_s_2_light&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;{
  "name": "matterbridgeceilling21v2",
  "port": 5543,
  "filter": {
    "include": [
      {
        "type": "pattern",
        "value": "light.yeelink_cn_476690814_ceiling21_s_2_light"
      }
    ],
    "exclude": []
  },
  "featureFlags": {
    "coverDoNotInvertPercentage": false,
    "includeHiddenEntities": false
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/ecovacs-part6/</id>
    <title>Ecovacs in Home Assistant Part6 - Robot Vacuum Control MCP Server</title>
    <updated>2026-04-25T14:10:22Z</updated>
    <published>2026-02-13T13:17:29Z</published>
    <link href="http://blog.matterxiaomi.com/blog/ecovacs-part6/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <category term="vacuum" />
    <content type="html">&lt;p&gt;Official Ecovacs Deebot MCP Server&lt;/p&gt;
&lt;p&gt;MCP protocol&lt;/p&gt;
&lt;p&gt;https://github.com/ecovacs-ai/ecovacs-mcp/blob/main/ecovacs_mcp/robot_mcp_stdio.py&lt;/p&gt;
&lt;p&gt;Created:2025.04.24&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Official Doc：&lt;/p&gt;
&lt;p&gt;https://open.ecovacs.com/#/serviceOverview&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;way 1.custom integration&lt;/p&gt;
&lt;p&gt;https://github.com/hoangminh1109/ecovacs_cn&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;way 2. mcp client integraton in ha&lt;/p&gt;
&lt;p&gt;mcp client integration&lt;/p&gt;
&lt;p&gt;https://www.home-assistant.io/integrations/mcp&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;mcp client integraton configure&lt;/p&gt;
&lt;p&gt;The remote MCP server URL for the SSE endpoint, for example http://example/mcp&lt;/p&gt;
&lt;p&gt;Ecovacs SSE Server URL:&lt;/p&gt;
&lt;p&gt;https://mcp-open.ecovacs.cn/sse?ak=your ak&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;useful links&lt;/p&gt;
&lt;p&gt;https://open.ecovacs.com/#/serviceOverview&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/create-wyoming-server-home-assistant-part5/</id>
    <title>ModelScope vs Hugging Face vs k2-fsa.github.io vs Kaldi vs Sherpa</title>
    <updated>2026-02-19T16:28:27Z</updated>
    <published>2026-02-11T19:46:43Z</published>
    <link href="http://blog.matterxiaomi.com/blog/create-wyoming-server-home-assistant-part5/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <content type="html">&lt;p&gt;Hugging Face, ModelScope, and k2-fsa.github.io (specifically the k2-fsa/sherpa-onnx project) represent different approaches to the machine learning ecosystem。&lt;/p&gt;
&lt;p&gt;Hugging Face and ModelScope host everything. k2-fsa and Sherpa only do Speech.&lt;/p&gt;
&lt;p&gt;k2-fsa and Sherpa are highly specialized tools focused on speech recognition (ASR) and synthesis (TTS)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Sherpa (often referred to as sherpa-onnx or sherpa-ncnn) is a lightweight speech-to-text (ASR) and text-to-speech (TTS) engine.&amp;nbsp;Best for: Deploying speech models on edge devices (Android, iOS, WebAssembly, ARM boards) or high-performance servers, prioritizing low latency and CPU efficiency.&lt;/p&gt;
&lt;div class="mce-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jhchklu53"&gt;Hugging Face（Global AI&amp;nbsp;model platform）&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jhchiqmm1"&gt;ModelScope(The Alibaba/Chinese AI Industrial model platform)&amp;nbsp;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jhchocht5"&gt;specialized tools focused on speech recognition (ASR) and synthesis (TTS)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jh8e1hud2"&gt;k2-fsa (Next-Gen Kaldi)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jhcht3fl7"&gt;Sherpa(The Real-Time Speech Deployment Tool)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;These four entities represent two different categories: Model Ecosystems (Hugging Face &amp;amp; ModelScope) and Speech Recognition Frameworks (Kaldi &amp;amp; k2-fsa).&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jhchklu53"&gt;&lt;strong&gt;Hugging Face（&lt;/strong&gt;Global AI&amp;nbsp;model platform&lt;strong&gt;）&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;The Global Industry Standard。It supports NLP, computer vision, audio, and multimodal models via transformers and diffusers libraries.&lt;/p&gt;
&lt;p&gt;download models&lt;/p&gt;
&lt;p&gt;https://huggingface.co/models&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;https://huggingface.co/FunAudioLLM/SenseVoiceSmall&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;https://huggingface.co/funasr/paraformer-zh/blame/7904416f6cb6290ee7dc0b2ddb2993a9fe4f421a/README.md&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jhchiqmm1"&gt;&lt;strong&gt;ModelScope&lt;/strong&gt;&lt;strong&gt;(The Alibaba/Chinese AI Industrial model platform)&amp;nbsp;&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;ModelScope is an AI model hub led by Alibaba DAMO Academy. It provides pre-trained models, pipelines, and deployment tools, especially strong in Chinese language and speech technologies.ModelScope is often described as the "Chinese Hugging Face."&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;https://modelscope.cn/search?search=sherpa%20onnx&lt;/p&gt;
&lt;p&gt;Inference&amp;nbsp; Framework&lt;/p&gt;
&lt;p&gt;1.funasr&lt;/p&gt;
&lt;p&gt;2.funasr-onnx&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;see:https://github.com/modelscope/FunASR?tab=readme-ov-file#sensevoice&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jhchocht5"&gt;specialized tools focused on speech recognition (ASR) and synthesis (TTS)&lt;/h2&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kaldi: Speech Toolkit&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The "grandfather" of modern speech recognition. It is a C++ based toolkit developed primarily by Dan Povey.&lt;/p&gt;
&lt;p&gt;demo&lt;/p&gt;
&lt;p&gt;KaldiRecognizer&lt;/p&gt;
&lt;p&gt;https://github.com/rhasspy/wyoming-faster-whisper/blob/main/wyoming_faster_whisper/__main__.py&lt;/p&gt;
&lt;h3 id="mcetoc_1jh8e1hud2"&gt;k2-fsa (Next-Gen Kaldi)&lt;/h3&gt;
&lt;p&gt;&amp;nbsp;Speech Toolkit.The Modern Successor&lt;/p&gt;
&lt;p&gt;What it is: Often called "Next-gen Kaldi." It is a complete rewrite of Kaldi&amp;rsquo;s core concepts to make them natively compatible with &lt;strong&gt;PyTorch&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Key Repositories:&lt;/p&gt;
&lt;p&gt;Icefall: Where the actual training recipes for speech models (like Zipformer) live.&lt;/p&gt;
&lt;p&gt;k2: The core library for differentiable FSTs.the old version of the tool you used before k2-fsa existed.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;download sherpa-onnx&lt;/p&gt;
&lt;p&gt;https://github.com/k2-fsa/sherpa-onnx/releases&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;download sherpa-onnx&amp;nbsp;asr models&lt;/p&gt;
&lt;p&gt;https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html&lt;/p&gt;
&lt;p&gt;https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 id="mcetoc_1jh9dcjgo1"&gt;&amp;nbsp;download&amp;nbsp;Silero VAD ONNX model&lt;/h4&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;# https://k2-fsa.github.io/sherpa/onnx/sense-voice/pretrained.html#sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;download url:https://k2-fsa.github.io/sherpa/onnx/vad/silero-vad.html#download-models-files&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jhcht3fl7"&gt;&lt;strong&gt;Sherpa&lt;/strong&gt;&lt;strong&gt;(The Real-Time Speech Deployment Tool)&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The deployment engine (CPU/GPU, Android, iOS, WebAssembly).It uses models trained in the k2-fsa ecosystem.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;How they all fit together&lt;/p&gt;
&lt;p&gt;1.k2-fsa is the tool you use to build a high-performance speech model.&lt;/p&gt;
&lt;p&gt;silero-vad model download&lt;/p&gt;
&lt;p&gt;https://k2-fsa.github.io/sherpa/onnx/vad/silero-vad.html#download-models-files&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Once you've trained that model using k2-fsa, you might upload it to Hugging Face or ModelScope so others can download it easily.&lt;/p&gt;
&lt;p&gt;Hugging Face hosts models from both ModelScope and k2-fsa/Sherpa, serving as a distribution point for them.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Layer Relationship&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;Model Hosting &amp;amp; Distribution
   ├── ModelScope
   └── Hugging Face

Inference / Runtime Framework
   └── k2-fsa / sherpa&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/create-wyoming-server-home-assistant-part2/</id>
    <title>Create Wyoming server for Home assistant Part2 - stt -  wyoming-funasr arm64</title>
    <updated>2026-02-24T21:59:19Z</updated>
    <published>2026-02-05T17:55:26Z</published>
    <link href="http://blog.matterxiaomi.com/blog/create-wyoming-server-home-assistant-part2/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <content type="html">&lt;p&gt;Wyoming protocol server for the funasr speech to text system.stt -&amp;nbsp; wyoming-funasr arm64&lt;/p&gt;
&lt;p&gt;FunASR: A Fundamental End-to-End Speech Recognition Toolkit.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;To make an STT server work with Home Assistant, the industry standard is using the Wyoming Protocol.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;div class="mce-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jfpb2v493q"&gt;Step 1.Development Environment Setup &lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1ji8qhp3m2"&gt;Create Python virtual environment&amp;nbsp;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jfpb4g3942"&gt;Step 2. Install&amp;nbsp;&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgnflo1m9"&gt;Install torch&amp;nbsp;&amp;nbsp;via PyPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jfpb2v493s"&gt;Install FunASR 1.3.0&amp;nbsp; via PyPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&amp;nbsp;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jfpb2v493u"&gt;Verify installation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jfpb2v493v"&gt;Step3.Download and test a model (example: paraformer-zh)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jhpjdu7o2"&gt;SenseVoice -&amp;nbsp;Speech Recognition (Non-streaming)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jfpqvmgh2"&gt;Step 4.FunASR + Wyoming STT full server&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgnflo1ma"&gt;Install wyoming&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jh1j919c1"&gt;Strategies to reduce latency&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id="mcetoc_1jfpb2v493q"&gt;Step 1.Development Environment Setup&lt;/h2&gt;
&lt;h3 id="mcetoc_1ji8qhp3m2"&gt;Create Python virtual environment&amp;nbsp;&lt;/h3&gt;
&lt;p&gt;mkdir -p&amp;nbsp; /funasr-wyoming&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;cd /funasr-wyoming&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;python3 -m venv venv&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;source venv/bin/activate&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;python --version&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Python 3.11.2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;pip3 install wyoming==1.8.0&lt;/p&gt;
&lt;p&gt;pip3 install funasr==1.3.0&lt;/p&gt;
&lt;p&gt;pip3 install torch&lt;/p&gt;
&lt;p&gt;pip3 install torchaudio&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;apt list --installed&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;(venv) root@raspberrypi:/funasr-wyoming# pip3 show funasr
Name: funasr
Version: 1.3.0
Summary: FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Home-page: https://github.com/alibaba-damo-academy/FunASR.git
Author: Speech Lab of Alibaba Group
Author-email: funasr@list.alibaba-inc.com
License: The MIT License
Location: /funasr-wyoming/venv/lib/python3.11/site-packages
Requires: editdistance, hydra-core, jaconv, jamo, jieba, kaldiio, librosa, modelscope, oss2, pytorch_wpe, PyYAML, requests, scipy, sentencepiece, soundfile, tensorboardX, torch_complex, tqdm, umap_learn
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Requirements&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;python&amp;gt;=3.8
torch&amp;gt;=1.13
torchaudio&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jfpb4g3942"&gt;Step 2. Install&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;(venv) root@raspberrypi:/funasr-wyoming# pip3 --version&lt;/p&gt;
&lt;p&gt;pip 23.0.1 from /funasr-wyoming/venv/lib/python3.11/site-packages/pip (python 3.11)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jgnflo1m9"&gt;Install torch&amp;nbsp;&amp;nbsp;via PyPI&lt;/h3&gt;
&lt;p&gt;pip3 install torch==2.1.0&amp;nbsp; &amp;nbsp;(CPU-only)&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Installing collected packages: mpmath, sympy, networkx, MarkupSafe, fsspec, jinja2, torch
Successfully installed MarkupSafe-3.0.3 fsspec-2026.1.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.6.1 sympy-1.14.0 torch-2.1.0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;if ffmpeg is not installed. torchaudio is used to load audio&lt;/p&gt;
&lt;p&gt;pip3 install torchaudio==2.1.0&amp;nbsp; &amp;nbsp;(CPU-only)&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Successfully installed torchaudio-2.1.0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You will need the wyoming and funasr libraries.&lt;/p&gt;
&lt;h3 id="mcetoc_1jfpb2v493s"&gt;Install FunASR 1.3.0&amp;nbsp; via PyPI&lt;/h3&gt;
&lt;p&gt;pip3 install -U funasr==1.3.0&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;This will pull:&lt;/p&gt;
&lt;p&gt;Downloading https://www.piwheels.org/simple/threadpoolctl/threadpoolctl-3.6.0-py3-none-any.whl (18 kB)&lt;/p&gt;
&lt;p&gt;Installing collected packages: jieba, jamo, jaconv, crcmod, antlr4-python3-runtime, urllib3, typing_extensions, tqdm, threadpoolctl, six, sentencepiece, PyYAML, pycryptodome, pycparser, protobuf, platformdirs, packaging, numpy, msgpack, llvmlite, joblib, jmespath, idna, filelock, editdistance, decorator, charset_normalizer, certifi, audioread, torch_complex, tensorboardX, soxr, scipy, requests, pytorch_wpe, omegaconf, numba, lazy_loader, kaldiio, cffi, soundfile, scikit-learn, pooch, modelscope, hydra-core, cryptography, pynndescent, librosa, aliyun-python-sdk-core, umap_learn, aliyun-python-sdk-kms, oss2, funasr&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Successfully installed PyYAML-6.0.3 aliyun-python-sdk-core-2.16.0 aliyun-python-sdk-kms-2.16.5 antlr4-python3-runtime-4.9.3 audioread-3.1.0 certifi-2026.1.4 cffi-2.0.0 charset_normalizer-3.4.4 crcmod-1.7 cryptography-46.0.3 decorator-5.2.1 editdistance-0.8.1 filelock-3.20.3 funasr-1.3.0 hydra-core-1.3.2 idna-3.11 jaconv-0.4.1 jamo-0.4.1 jieba-0.42.1 jmespath-0.10.0 joblib-1.5.3 kaldiio-2.18.1 lazy_loader-0.4 librosa-0.11.0 llvmlite-0.46.0 modelscope-1.34.0 msgpack-1.1.2 numba-0.63.1 numpy-2.3.5 omegaconf-2.3.0 oss2-2.19.1 packaging-26.0 platformdirs-4.5.1 pooch-1.8.2 protobuf-6.33.4 pycparser-3.0 pycryptodome-3.23.0 pynndescent-0.6.0 pytorch_wpe-0.0.1 requests-2.32.5 scikit-learn-1.8.0 scipy-1.17.0 sentencepiece-0.2.1 six-1.17.0 soundfile-0.13.1 soxr-1.0.0 tensorboardX-2.6.4 threadpoolctl-3.6.0 torch_complex-0.4.4 tqdm-4.67.1 typing_extensions-4.15.0 umap_learn-0.5.11 urllib3-2.6.3
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;detail:https://pypi.org/project/funasr&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;sudo apt install ffmpeg&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;p&gt;ffmpeg is already the newest version (8:5.1.8-0+deb12u1+rpt1).&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;if ffmpeg is not installed. torchaudio is used to load audio&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jfpb2v493t"&gt;&amp;nbsp;&lt;/h3&gt;
&lt;h3 id="mcetoc_1jfpb2v493u"&gt;Verify installation&lt;/h3&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;python - &amp;lt;&amp;lt; 'EOF'
from funasr import AutoModel
print("FunASR imported OK")
EOF
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;FunASR imported OK&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jfpb2v493v"&gt;Step3.Download and test a model (example: paraformer-zh)&lt;/h2&gt;
&lt;p&gt;test.py&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;from funasr import AutoModel

model = AutoModel(
    model="paraformer-zh",
    model_revision="v2.0.4",
    vad_model="fsmn-vad",
    vad_model_revision="v2.0.4",
    punc_model="ct-punc",
    punc_model_revision="v2.0.4",
)

res = model.generate(input="test.wav")
print(res)

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav")
print(res)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jhpjdu7o2"&gt;SenseVoice -&amp;nbsp;Speech Recognition (Non-streaming)&lt;/h3&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en
res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;see:https://github.com/modelscope/FunASR?tab=readme-ov-file#sensevoice&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;python3 test.py&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Models are cached in:&lt;/p&gt;
&lt;p&gt;/root/.cache/modelscope/hub/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch


2026-01-25 08:58:28,492 - modelscope - INFO - Use user-specified model revision: v2.0.4
2026-01-25 08:58:28,595 - modelscope - INFO - Got 11 files, start to download ...
Downloading [fig/res.png]: 100%|███████████████████████████████████████████████████| 192k/192k [00:00&amp;lt;00:00, 386kB/s]
Downloading [am.mvn]: 100%|█████████████████████████████████████████████████████| 10.9k/10.9k [00:00&amp;lt;00:00, 21.7kB/s]
Downloading [example/hotword.txt]: 100%|███████████████████████████████████████████| 7.00/7.00 [00:00&amp;lt;00:00, 11.9B/s]
Downloading [config.yaml]: 100%|████████████████████████████████████████████████| 3.34k/3.34k [00:00&amp;lt;00:00, 5.66kB/s]
Downloading [configuration.json]: 100%|███████████████████████████████████████████████| 478/478 [00:00&amp;lt;00:00, 766B/s]
Downloading [README.md]: 100%|██████████████████████████████████████████████████| 11.3k/11.3k [00:00&amp;lt;00:00, 18.2kB/s]
Downloading [example/asr_example.wav]: 100%|███████████████████████████████████████| 141k/141k [00:00&amp;lt;00:00, 208kB/s]
Downloading [fig/seaco.png]: 100%|█████████████████████████████████████████████████| 167k/167k [00:00&amp;lt;00:00, 296kB/s]
Downloading [tokens.json]: 100%|█████████████████████████████████████████████████| 91.5k/91.5k [00:00&amp;lt;00:00, 165kB/s]
Downloading [seg_dict]: 100%|███████████████████████████████████████████████████| 7.90M/7.90M [00:03&amp;lt;00:00, 2.76MB/s]
Downloading [model.pt]: 100%|█████████████████████████████████████████████████████| 944M/944M [01:31&amp;lt;00:00, 10.8MB/s]
Processing 11 items: 100%|████████████████████████████████████████████████████████| 11.0/11.0 [01:31&amp;lt;00:00, 8.34s/it]
2026-01-25 09:00:00,347 - modelscope - INFO - Download model 'iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch' successfully.█████████████████████████████████████████████| 91.5k/91.5k [00:00&amp;lt;00:00, 165kB/s]
WARNING:root:trust_remote_code: False                                            | 5.00M/944M [00:01&amp;lt;02:50, 5.77MB/s]
Downloading [model.pt]:   2%|█▎                                                  | 23.0M/944M [00:03&amp;lt;02:15, 7.12MB/s]
Downloading [model.pt]: 100%|████████████████████████████████████████████████████▉| 942M/944M [01:31&amp;lt;00:00, 6.28MB/s]
Downloading [seg_dict]: 100%|███████████████████████████████████████████████████| 7.90M/7.90M [00:03&amp;lt;00:00, 4.16MB/s&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;python3 -c "from funasr import AutoModel; AutoModel(model='paraformer-zh', device='cpu')"&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jfpqvmgh2"&gt;Step 4.FunASR + Wyoming STT full server&lt;/h2&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jgnflo1ma"&gt;Install wyoming&lt;/h3&gt;
&lt;p&gt;pip3 install&amp;nbsp;wyoming==1.8.0&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting wyoming==1.5.0
  Downloading wyoming-1.5.0-py3-none-any.whl (23 kB)
Installing collected packages: wyoming
Successfully installed wyoming-1.5.0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;A Wyoming server consists of an AsyncServer and an AsyncEventHandler. The handler processes events like Describe&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;python3 server.py&lt;/p&gt;
&lt;p&gt;You will need the wyoming and funasr libraries.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Describe event&lt;/p&gt;
&lt;p&gt;Listen for a Describe event (to tell HA it's an STT service)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;AudioStart&amp;nbsp; event&lt;/p&gt;
&lt;p&gt;Ha client-&amp;gt;&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;{
  "type": "audio.start",
  "rate": 16000,
  "width": 2,
  "channels": 1
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;AudioStart.is_type(event.type)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;AudioChunk event&lt;/p&gt;
&lt;p&gt;The AudioChunk event is where you collect the raw PCM data.Receive AudioChunk events and buffer them.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;AudioStop&amp;nbsp;event&lt;/p&gt;
&lt;p&gt;Trigger the SenseVoice model when AudioStop event is received.AudioStop event is where you trigger the inference.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;funasr-wyoming# python3 server.py
funasr version: 1.3.0.
Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel
New version is available: 1.3.1.
Please use the command "pip install -U funasr" to upgrade.
2026-02-19 14:15:51.321 - INFO - root - download models from model hub: ms
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/iic/SenseVoiceSmall
2026-02-19 14:15:52.591 - WARNING - root - trust_remote_code: False
2026-02-19 14:15:54.679 - INFO - root - Loading pretrained params from /root/.cache/modelscope/hub/models/iic/SenseVoiceSmall/model.pt
2026-02-19 14:15:54.688 - INFO - root - ckpt: /root/.cache/modelscope/hub/models/iic/SenseVoiceSmall/model.pt
2026-02-19 14:16:10.752 - INFO - root - scope_map: ['module.', 'None']
2026-02-19 14:16:10.871 - INFO - root - excludes: None
2026-02-19 14:16:11.383 - INFO - root - Loading ckpt: /root/.cache/modelscope/hub/models/iic/SenseVoiceSmall/model.pt, status: &amp;lt;All keys matched successfully&amp;gt;
2026-02-19 14:16:11.461 - INFO - root - Building VAD model.
2026-02-19 14:16:11.461 - INFO - root - download models from model hub: ms
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch
2026-02-19 14:16:13.308 - WARNING - root - trust_remote_code: False
2026-02-19 14:16:13.652 - INFO - root - Loading pretrained params from /root/.cache/modelscope/hub/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
2026-02-19 14:16:13.653 - INFO - root - ckpt: /root/.cache/modelscope/hub/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
2026-02-19 14:16:13.959 - INFO - root - scope_map: ['module.', 'None']
2026-02-19 14:16:13.959 - INFO - root - excludes: None
2026-02-19 14:16:13.962 - INFO - root - Loading ckpt: /root/.cache/modelscope/hub/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt, status: &amp;lt;All keys matched successfully&amp;gt;
2026-02-19 14:16:14.020 - INFO - wyoming-funasr-stt-server - Wyoming STT Server started on port 10850
2026-02-19 14:16:19.596 - INFO - wyoming-funasr-stt-server - Transcribe received
2026-02-19 14:16:19.597 - INFO - wyoming-funasr-stt-server - AudioStart received
2026-02-19 14:16:22.530 - INFO - wyoming-funasr-stt-server - AudioStop received. Processing...  95040 bytes of audio...
2026-02-19 14:16:22.557 - 音频长度: 47520 samples, 2.97 秒
2026-02-19 14:16:22.559 - INFO - wyoming-funasr-stt-server - Audio length: 47520 samples, 2.97 s
rtf_avg: 0.578: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01&amp;lt;00:00,  1.71s/it]
rtf_avg: 1.793: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01&amp;lt;00:00,  1.72s/it]
rtf_avg: 0.581, time_speech:  2.970, time_escape: 1.727: 100%|████████████████████████████████████████████| 1/1 [00:01&amp;lt;00:00,  1.73s/it]
2026-02-19 14:16:26.056 - 过滤掉 SenseVoice 可能输出的情感/事件标签
2026-02-19 14:16:26.056 - INFO - wyoming-funasr-stt-server - 识别结果原文：可能输出的情感/事件标签- Result: &amp;lt;|zh|&amp;gt;&amp;lt;|NEUTRAL|&amp;gt;&amp;lt;|Speech|&amp;gt;&amp;lt;|withitn|&amp;gt;换货。
2026-02-19 14:16:26.057 - INFO - wyoming-funasr-stt-server - Result: 换货。
2026-02-19 14:16:26.057 - 识别结果: 换货。
2026-02-19 14:16:35.711 - INFO - wyoming-funasr-stt-server - Transcribe received
2026-02-19 14:16:35.711 - INFO - wyoming-funasr-stt-server - AudioStart received
2026-02-19 14:16:38.141 - INFO - wyoming-funasr-stt-server - AudioStop received. Processing...  78400 bytes of audio...
2026-02-19 14:16:38.143 - 音频长度: 39200 samples, 2.45 秒
2026-02-19 14:16:38.143 - INFO - wyoming-funasr-stt-server - Audio length: 39200 samples, 2.45 s
rtf_avg: 0.028: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00&amp;lt;00:00, 14.81it/s]
rtf_avg: 1.693: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01&amp;lt;00:00,  1.53s/it]
rtf_avg: 0.624, time_speech:  2.450, time_escape: 1.530: 100%|████████████████████████████████████████████| 1/1 [00:01&amp;lt;00:00,  1.53s/it]
2026-02-19 14:16:39.742 - 过滤掉 SenseVoice 可能输出的情感/事件标签
2026-02-19 14:16:39.742 - INFO - wyoming-funasr-stt-server - 识别结果原文：可能输出的情感/事件标签- Result: &amp;lt;|zh|&amp;gt;&amp;lt;|NEUTRAL|&amp;gt;&amp;lt;|Speech|&amp;gt;&amp;lt;|withitn|&amp;gt;关火。
2026-02-19 14:16:39.742 - INFO - wyoming-funasr-stt-server - Result: 关火。
2026-02-19 14:16:39.742 - 识别结果: 关火。
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;pip3 list&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Package                Version
---------------------- --------
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms  2.16.5
antlr4-python3-runtime 4.9.3
audioread              3.1.0
certifi                2026.1.4
cffi                   2.0.0
charset-normalizer     3.4.4
crcmod                 1.7
cryptography           46.0.3
decorator              5.2.1
editdistance           0.8.1
filelock               3.20.3
fsspec                 2026.1.0
funasr                 1.3.0
hydra-core             1.3.2
idna                   3.11
ifaddr                 0.2.0
jaconv                 0.4.1
jamo                   0.4.1
jieba                  0.42.1
Jinja2                 3.1.6
jmespath               0.10.0
joblib                 1.5.3
kaldiio                2.18.1
lazy_loader            0.4
librosa                0.11.0
llvmlite               0.46.0
MarkupSafe             3.0.3
modelscope             1.34.0
mpmath                 1.3.0
msgpack                1.1.2
networkx               3.6.1
numba                  0.63.1
numpy                  1.26.4
omegaconf              2.3.0
oss2                   2.19.1
packaging              26.0
pip                    23.0.1
platformdirs           4.5.1
pooch                  1.8.2
protobuf               6.33.4
pycparser              3.0
pycryptodome           3.23.0
pynndescent            0.6.0
pytorch-wpe            0.0.1
PyYAML                 6.0.3
requests               2.32.5
scikit-learn           1.8.0
scipy                  1.17.0
sentencepiece          0.2.1
setuptools             66.1.1
six                    1.17.0
soundfile              0.13.1
soxr                   1.0.0
sympy                  1.14.0
tensorboardX           2.6.4
threadpoolctl          3.6.0
torch                  2.1.0
torch_complex          0.4.4
torchaudio             2.1.0
tqdm                   4.67.1
typing_extensions      4.15.0
umap-learn             0.5.11
urllib3                2.6.3
wyoming                1.8.0
zeroconf               0.148.0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jh1j919c1"&gt;Strategies to reduce latency&lt;/h2&gt;
&lt;p&gt;1.Use smaller models&lt;/p&gt;
&lt;p&gt;FunASR has paraformer-zh-small or paraformer-zh-medium&lt;/p&gt;
&lt;p&gt;2.VAD pre-filtering&lt;/p&gt;
&lt;p&gt;Skip silence chunks &amp;rarr; speech &amp;rarr; Skip silence chunks&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/create-wyoming-server-home-assistant-part3/</id>
    <title>Create Wyoming server for Home assistant Part3 - stt -  wyoming-funasr arm64 sherpa-onnx</title>
    <updated>2026-02-19T19:01:24Z</updated>
    <published>2026-02-04T00:53:12Z</published>
    <link href="http://blog.matterxiaomi.com/blog/create-wyoming-server-home-assistant-part3/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <content type="html">&lt;p&gt;stt -&amp;nbsp; wyoming-funasr arm64 onnx&lt;/p&gt;
&lt;p&gt;Wyoming asr&amp;nbsp; server for Home assistant: Step-by-Step Guide for Developers&lt;/p&gt;
&lt;p&gt;Building FunASR with sherpa-onnx on an ARM64 (aarch64) system.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;To make an STT server work with Home Assistant, the industry standard is using the Wyoming Protocol.&lt;/p&gt;
&lt;p&gt;sherpa-onnx: The "engine" that runs the stt. It manages the model files and utilizes the RPi5&amp;rsquo;s CPU.&lt;/p&gt;
&lt;p&gt;Local Control: You use the Wyoming Protocol Integration&amp;nbsp; bridge to link the stt server to Home Assistant's "Assist" pipeline.&lt;/p&gt;
&lt;div class="mce-toc"&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgp9detg3"&gt;Quick start&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgp96qm81"&gt;Create Python virtual environment&amp;nbsp;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgj084196"&gt;Install sherpa-onnx&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgj084197"&gt;Install&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgj0hqlhe"&gt;Download Model&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jglk9fvq1"&gt;download pre-trained&amp;nbsp; models(SenseVoice)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgjc7k9k1"&gt;server.py&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#mcetoc_1jgjc8q7q3"&gt;step 1. install&amp;nbsp;numpy&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jgp9detg3"&gt;Quick start&lt;/h2&gt;
&lt;p&gt;#&amp;nbsp;Install sherpa-onnx&lt;/p&gt;
&lt;p&gt;pip3 install sherpa-onnx sherpa-onnx-bin&lt;/p&gt;
&lt;p&gt;pip3 install wyoming==1.8.0&lt;/p&gt;
&lt;p&gt;pip3 install numpy&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jgp96qm81"&gt;Create Python virtual environment&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;mkdir -p&amp;nbsp; /funasr-wyoming-sherpa-onnx&lt;/p&gt;
&lt;p&gt;&amp;nbsp;cd /funasr-wyoming-sherpa-onnx&lt;/p&gt;
&lt;p&gt;&amp;nbsp;python3 -m venv venv&lt;/p&gt;
&lt;p&gt;&amp;nbsp;source venv/bin/activate&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jgj084196"&gt;Install sherpa-onnx&lt;/h2&gt;
&lt;h3 id="mcetoc_1jgj084197"&gt;Install&lt;/h3&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;pip3 install sherpa-onnx sherpa-onnx-bin
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Installing collected packages: sherpa-onnx-core, sherpa-onnx-bin, sherpa-onnx
Successfully installed sherpa-onnx-1.12.23 sherpa-onnx-bin-1.12.23 sherpa-onnx-core-1.12.23&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;https://k2-fsa.github.io/sherpa/onnx/python/install.html#method-1-from-pre-compiled-wheels-cpu-only&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;# pip3 show sherpa-onnx&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-markup"&gt;&lt;code&gt;Name: sherpa_onnx
Version: 1.12.23
Summary: 
Home-page: https://github.com/k2-fsa/sherpa-onnx
Author: The sherpa-onnx development team
Author-email: dpovey@gmail.com
License: Apache licensed, as found in the LICENSE file
Location: /funasr-wyoming-sherpa-onnx/venv/lib/python3.11/site-packages&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jgj0hqlhe"&gt;Download Model&lt;/h2&gt;
&lt;p&gt;This section describes how to download pre-trained SenseVoice models.&lt;/p&gt;
&lt;h3 id="mcetoc_1jglk9fvq1"&gt;download pre-trained&amp;nbsp; models(SenseVoice)&lt;/h3&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;cd /funasr-wyoming-sherpa-onnx



wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2

tar xvf sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2

rm sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09.tar.bz2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;or&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;# 安装 ModelScope
pip install modelscope

# SDK 模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('xiaowangge/sherpa-onnx-sense-voice-small')&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;root@raspberrypi:/funasr-wyoming-sherpa-onnx# tree -L 2
.
├── sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09
│   ├── model.int8.onnx
│   ├── README.md
│   ├── test_wavs
│   └── tokens.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 id="mcetoc_1jgjc7k9k1"&gt;server.py&lt;/h2&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="mcetoc_1jgjc8q7q3"&gt;step 1. install&amp;nbsp;numpy&lt;/h3&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;pip3 install numpy
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;Installing collected packages: numpy
Successfully installed numpy-2.4.2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;pip3 install wyoming==1.8.0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;output&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;  Downloading https://www.piwheels.org/simple/wyoming/wyoming-1.8.0-py3-none-any.whl (39 kB)
Installing collected packages: wyoming
Successfully installed wyoming-1.8.0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;cd /funasr-wyoming-sherpa-onnx&lt;/p&gt;
&lt;p&gt;source venv/bin/activate&lt;/p&gt;
&lt;p&gt;python3 server.py&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;useful links&lt;/p&gt;
&lt;p&gt;how to download pre-trained SenseVoice models.&lt;/p&gt;
&lt;p&gt;https://k2-fsa.github.io/sherpa/onnx/sense-voice/index.html&lt;/p&gt;
&lt;p&gt;https://k2-fsa.github.io/sherpa/onnx/sense-voice/pretrained.html#sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;https://k2-fsa.github.io/sherpa/onnx/sense-voice/pretrained.html#sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2025-09-09-chinese-english-japanese-korean-cantonese&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>http://blog.matterxiaomi.com/blog/home-assistant-automation-part1/</id>
    <title>How to repeat an action until another (specific) device trigger is received in home assistant</title>
    <updated>2026-02-03T22:23:10Z</updated>
    <published>2026-01-31T13:53:45Z</published>
    <link href="http://blog.matterxiaomi.com/blog/home-assistant-automation-part1/" />
    <author>
      <name>test@example.com</name>
      <email>blog.matterxiaomi.com</email>
    </author>
    <category term="automation" />
    <content type="html">&lt;p&gt;How to repeat an action until another (specific) device trigger is received in home assistant&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;To repeat an action in Home Assistant until a specific device trigger is received, use a repeat-until loop combined with a wait_for_trigger action.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style="color: #222222; font-family: 'Open Sans', Ubuntu, 'Nimbus Sans L', Avenir, AvenirNext, 'Segoe UI', Helvetica, Arial, sans-serif; font-size: 19px;"&gt;until&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;img src="/Posts/files/repeat-until-trigger-ed-_639055750356970042.jpg" alt="repeat-until-trigger-ed-.jpg" width="1020" height="805" /&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;img src="/Posts/files/repeat-until-trigger-id-2_639055750357620317.jpg" alt="repeat-until-trigger-id-2.jpg" width="1569" height="899" /&gt;&lt;/p&gt;
&lt;pre class="language-python"&gt;&lt;code&gt;alias: New automation find my mobile phone - 找手机
description: &amp;gt;-
  - http://localhost:4999/boards/topic/16760/notify - ## 
  http://localhost:4999/boards/topic/16760 - ##
  http://192.168.2.125:8123/config/automation/edit/1747045045725
triggers:
  - trigger: conversation
    command: "[找手机|手机|手机在哪儿|我手机在哪儿]"
    id: id_find_my_phone
  - trigger: state
    entity_id:
      - binary_sensor.sm_g9910_private_interactive
    from:
      - "off"
    to:
      - "on"
    id: id_phone_interactive_turn_on
    enabled: true
conditions: []
actions:


  - repeat:
      until:
        - condition: trigger
          id:
            - id_phone_interactive_turn_on
      sequence:
        - action: script.turn_on
          metadata: {}
          data: {}
          target:
            entity_id: script.find_my_phone
        - delay:
            hours: 0
            minutes: 0
            seconds: 5
            milliseconds: 0
    enabled: false
 

mode: single
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;way 2.&lt;/p&gt;
&lt;pre class="language-csharp"&gt;&lt;code&gt;sequence:
  - variables:
      stopVar: false
  - parallel:
      - repeat:
          until:
            - condition: template
              value_template: "{{ stopVar == true }}"
          sequence:
            - delay:
                hours: 0
                minutes: 0
                seconds: 1
                milliseconds: 0
            - action: input_boolean.toggle
              metadata: {}
              data: {}
              target:
                entity_id: input_boolean.quick_toggle
      - sequence:
          - wait_for_trigger:
              - trigger: state
                entity_id:
                  - button.push
          - variables:
              stopVar: true
alias: Parallel Test
description: ""&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</content>
  </entry></feed>