Run piper on external server

There are two ways to go about starting the containers.

First, you could define a docker compose file with both services inside.

Second, you can execute each docker run command on its own, since the containers don’t need special configuration. We will go with the second option.

 

This tutorial explains how you can run a single-container text to speech service on your local machine using Docker.

GitHub - rhasspy/wyoming-piper: Wyoming protocol server for Piper

Prerequisites

  • use the docker image of the HA OS addon
  • Docker

Note that these libraries are only supported on x86 architectures.

 

The easier way is to use official docker image and you can run it on another host if you want, just specify the right IP address and ports when you will configure the integrations. You can use this working solution based on docker compose or run the containers manually

Quick Start

pull image

docker pull rhasspy/wyoming-piper 

change path and Rename container

change

-v /path/to/local/data:/data

to

-v /opt/whisper-piper/data:/data

run the container.

Whisper and piper are discoveryed by home assistant.

add wyoming integration

For host I selected localhost and for port 10200 and 10300

 

 

Step 1. found this docker image

use the docker image of the HA OS addon.I used wyoming-piper for piper and wyoming-faster-whisper for whisper

Home Assistant Add-on: Whisper

Home Assistant add-on that uses faster-whisper 

https://github.com/guillaumekln/faster-whisper/

for speech-to-text.

 

Home Assistant Add-on: Piper

Home Assistant add-on that uses piper(https://github.com/rhasspy/piper/)  for text-to-speech.

The documentation within the repo only says:

Home Assistant add-on that uses https://github.com/rhasspy/piper/ for text-to-speech.

source:https://github.com/home-assistant/addons/tree/master/piper

detail:https://www.matterxiaomi.com//boards/topic/16775/how-to-manually-install-piper-and-whisper-on-home-assistant-os#23820

Step 2. Clone the sample code repository

docker pull rhasspy/wyoming-piper 
 

run the image in interactive mode:

mkdir /root/piper-data
docker run -it -p 10200:10200 \
  -v /root/piper-data:/data \
  rhasspy/wyoming-piper \
  --voice de_DE-kerstin-low
voice model
You can find the available models here:https://huggingface.co/rhasspy/piper-voices/tree/v1.0.0

Step 3. Starting whisper

This will start the container in detached mode:

mkdir /root/piper-data
docker run -d -p 10200:10200 \
  -v /root/piper-data:/data \
  rhasspy/wyoming-piper \
  --voice de_DE-kerstin-low
 

The build process uses configuration files from the chuck_var directory. The resulting image will serve two pretrained models (en-us-multimedia and fr-fr-multimedia) supporting English (en_US) and French (fr_FR).

Other models can be added to support other languages by updating the provided Dockerfile, as well as env_config.json and sessionPools.yaml in the chuck_var directory.

Step 4. Run the container to start the wyoming-piper text to speech service

 

The --voice argument can be the path to a custom voice file (<voice>.onnx). The voice config file must be named <voice>.onnx.json

Run a piper server that anyone can connect to:

docker run -it -p 10200:10200 -v /path/to/local/data:/data rhasspy/wyoming-piper \
    --voice en_US-lessac-medium

 

come from:https://github.com/rhasspy/wyoming-piper

Step 5. Query the Watson text to speech service

Open up another terminal to query the service. To start, get the language models available from the service:

# docker ps -a
CONTAINER ID   IMAGE                                          COMMAND                  CREATED          STATUS                      PORTS                                           NAMES
df86968bbd6c   rhasspy/wyoming-piper                          "bash /run.sh --voic…"   11 minutes ago   Up 11 minutes               0.0.0.0:10200->10200/tcp, :::10200->10200/tcp   brave_diffie
 
 

You will see output similar to the following:

"HostConfig": {
            "Binds": [
                "/path/to/local/data:/data"

 

df86968bbd6c

# docker inspect df86968bbd6c
[
    {
        "Id": "df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84",
        "Created": "2024-11-10T17:34:24.957032992Z",
        "Path": "bash",
        "Args": [
            "/run.sh",
            "--voice",
            "en_US-lessac-medium"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 27490,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2024-11-10T17:34:26.514749258Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:103222f9522bf53fcd342ced9433f8666accca17603b583abcbc939962825c11",
        "ResolvConfPath": "/var/lib/docker/containers/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84/hostname",
        "HostsPath": "/var/lib/docker/containers/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84/hosts",
        "LogPath": "/var/lib/docker/containers/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84-json.log",
        "Name": "/brave_diffie",
        "RestartCount": 0,
        "Driver": "overlayfs",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/path/to/local/data:/data"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "bridge",
            "PortBindings": {
                "10200/tcp": [
                    {
                        "HostIp": "",
                        "HostPort": "10200"
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "ConsoleSize": [
                26,
                162
            ],
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "private",
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": [],
            "BlkioDeviceWriteBps": [],
            "BlkioDeviceReadIOps": [],
            "BlkioDeviceWriteIOps": [],
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": [],
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware",
                "/sys/devices/virtual/powercap"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": null,
            "Name": "overlayfs"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/path/to/local/data",
                "Destination": "/data",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "df86968bbd6c",
            "Domainname": "",
            "User": "",
            "AttachStdin": true,
            "AttachStdout": true,
            "AttachStderr": true,
            "ExposedPorts": {
                "10200/tcp": {}
            },
            "Tty": true,
            "OpenStdin": true,
            "StdinOnce": true,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "--voice",
                "en_US-lessac-medium"
            ],
            "Image": "rhasspy/wyoming-piper",
            "Volumes": null,
            "WorkingDir": "/",
            "Entrypoint": [
                "bash",
                "/run.sh"
            ],
            "OnBuild": null,
            "Labels": {}
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "a354ae0488cb1ce0a2128350f87708f8587ce66888ffca20ea0729ba7780e69c",
            "SandboxKey": "/var/run/docker/netns/a354ae0488cb",
            "Ports": {
                "10200/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "10200"
                    },
                    {
                        "HostIp": "::",
                        "HostPort": "10200"
                    }
                ]
            },
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "790a9b8e1a61173a9c3b05323019ce6be08a2733cb1640a8411b059010dc59b5",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.2",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "MacAddress": "02",
                    "DriverOpts": null,
                    "NetworkID": "f1ea7b00d3270239d95954d07d6d563fd382a88fb891d0a786bf74c358b56c34",
                    "EndpointID": "790a9b8e1a61173a9c3b05323019ce6be08a2733cb1640a8411b059010dc59b5",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "DNSNames": null
                }
            }
        }
    }
]
 

Next, try getting transcriptions from speech samples in the sample_dataset directory. For English audio samples, use the default Watson Speech to Text model, which is configured as en-US_Multimedia in env_config.json:

curl "http://localhost:10200“
 

For French audio samples, specify the model fr-FR_Multimedia:

@raspberrypi:/data/homeassistant202405/tts# ls -l
total 80
-rw-r--r-- 1 root root 56315 Nov 10 17:59 3caeb43ede21da438bb30ea4128096d7132655ae_en-us_c5ee8c087b_tts.piper.mp3
-rw-r--r-- 1 root root  8135 Nov 10 17:57 b802f384302cb24fbab0a44997e820bf2e8507bb_en-us_c5ee8c087b_tts.piper.mp3
-rw-r--r-- 1 root root 12545 Nov 10 17:58 f44761887c59dbbd377ca5960da9e15f3b7df652_en-us_c5ee8c087b_tts.piper.mp3
 
 

In both cases, transcriptions (in JSON format) are returned as standard output of the curl commands.

Summary

In this tutorial, you learned how to run a single-container speech-to-text service on Docker.

Take a look at more Embeddable AI content on IBM Developer, and learn how you can embed AI into your products to differentiate your solution.

 

 

Useful links

Run a single-container speech-to-text service on Docker

https://developer.ibm.com/tutorials/run-a-single-container-speech-to-text-service-on-docker/

Comments


Comments are closed