Part 3 - Run piper on external server
Run piper on external server
There are two ways to go about starting the containers.
First, you could define a docker compose file with both services inside.
Second, you can execute each docker run command on its own, since the containers don’t need special configuration. We will go with the second option.
This tutorial explains how you can run a single-container text to speech service on your local machine using Docker.
GitHub - rhasspy/wyoming-piper: Wyoming protocol server for Piper
Table of Contents
Prerequisites
- use the docker image of the HA OS addon
- Docker
Note that these libraries are only supported on x86 architectures.
The easier way is to use official docker image and you can run it on another host if you want, just specify the right IP address and ports when you will configure the integrations. You can use this working solution based on docker compose or run the containers manually
Quick Start
pull image
docker pull rhasspy/wyoming-piper
change path and Rename container
change
-v /path/to/local/data:/data
to
-v /opt/whisper-piper/data:/data
run the container.
Whisper and piper are discoveryed by home assistant.
add wyoming integration
For host I selected localhost and for port 10200 and 10300
Step 1. found this docker image
use the docker image of the HA OS addon.I used wyoming-piper for piper and wyoming-faster-whisper for whisper
Home Assistant Add-on: Whisper
Home Assistant add-on that uses faster-whisper
https://github.com/guillaumekln/faster-whisper/
for speech-to-text.
Home Assistant Add-on: Piper
Home Assistant add-on that uses piper(https://github.com/rhasspy/piper/) for text-to-speech.
The documentation within the repo only says:
Home Assistant add-on that uses https://github.com/rhasspy/piper/ for text-to-speech.
source:https://github.com/home-assistant/addons/tree/master/piper
Step 2. Clone the sample code repository
docker pull rhasspy/wyoming-piper
run the image in interactive mode:
mkdir /root/piper-data
docker run -it -p 10200:10200 \
-v /root/piper-data:/data \
rhasspy/wyoming-piper \
--voice de_DE-kerstin-low
docker run -p 10200:10200 -v /t/path/to/local/data:/data rhasspy/wyoming-piper \
--voice en_US-lessac-medium
Step 3. Starting whisper
This will start the container in detached mode:
mkdir /root/piper-data
docker run -d -p 10200:10200 \
-v /root/piper-data:/data \
rhasspy/wyoming-piper \
--voice de_DE-kerstin-low
The build process uses configuration files from the chuck_var directory. The resulting image will serve two pretrained models (en-us-multimedia and fr-fr-multimedia) supporting English (en_US) and French (fr_FR).
Other models can be added to support other languages by updating the provided Dockerfile, as well as env_config.json and sessionPools.yaml in the chuck_var directory.
Step 4. Run the container to start the wyoming-piper text to speech service
The --voice argument can be the path to a custom voice file (<voice>.onnx). The voice config file must be named <voice>.onnx.json
Run a piper server that anyone can connect to:
docker run -it -p 10200:10200 -v /path/to/local/data:/data rhasspy/wyoming-piper \
--voice en_US-lessac-medium
come from:https://github.com/rhasspy/wyoming-piper
Step 5. Query the Watson text to speech service
Open up another terminal to query the service. To start, get the language models available from the service:
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
df86968bbd6c rhasspy/wyoming-piper "bash /run.sh --voic…" 11 minutes ago Up 11 minutes 0.0.0.0:10200->10200/tcp, :::10200->10200/tcp brave_diffie
You will see output similar to the following:
"HostConfig": {
"Binds": [
"/path/to/local/data:/data"
df86968bbd6c
# docker inspect df86968bbd6c
[
{
"Id": "df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84",
"Created": "2024-11-10T17:34:24.957032992Z",
"Path": "bash",
"Args": [
"/run.sh",
"--voice",
"en_US-lessac-medium"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 27490,
"ExitCode": 0,
"Error": "",
"StartedAt": "2024-11-10T17:34:26.514749258Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Image": "sha256:103222f9522bf53fcd342ced9433f8666accca17603b583abcbc939962825c11",
"ResolvConfPath": "/var/lib/docker/containers/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84/hostname",
"HostsPath": "/var/lib/docker/containers/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84/hosts",
"LogPath": "/var/lib/docker/containers/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84/df86968bbd6c40c1655f1e5631dd513a05a37948b63fe6a5fb6f5cd9dee2ba84-json.log",
"Name": "/brave_diffie",
"RestartCount": 0,
"Driver": "overlayfs",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "",
"ExecIDs": null,
"HostConfig": {
"Binds": [
"/path/to/local/data:/data"
],
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {}
},
"NetworkMode": "bridge",
"PortBindings": {
"10200/tcp": [
{
"HostIp": "",
"HostPort": "10200"
}
]
},
"RestartPolicy": {
"Name": "no",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"ConsoleSize": [
26,
162
],
"CapAdd": null,
"CapDrop": null,
"CgroupnsMode": "private",
"Dns": [],
"DnsOptions": [],
"DnsSearch": [],
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "private",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 0,
"PidMode": "",
"Privileged": false,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": null,
"UTSMode": "",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"Isolation": "",
"CpuShares": 0,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "",
"BlkioWeight": 0,
"BlkioWeightDevice": [],
"BlkioDeviceReadBps": [],
"BlkioDeviceWriteBps": [],
"BlkioDeviceReadIOps": [],
"BlkioDeviceWriteIOps": [],
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DeviceCgroupRules": null,
"DeviceRequests": null,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": null,
"OomKillDisable": null,
"PidsLimit": null,
"Ulimits": [],
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"MaskedPaths": [
"/proc/asound",
"/proc/acpi",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/proc/scsi",
"/sys/firmware",
"/sys/devices/virtual/powercap"
],
"ReadonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
},
"GraphDriver": {
"Data": null,
"Name": "overlayfs"
},
"Mounts": [
{
"Type": "bind",
"Source": "/path/to/local/data",
"Destination": "/data",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
],
"Config": {
"Hostname": "df86968bbd6c",
"Domainname": "",
"User": "",
"AttachStdin": true,
"AttachStdout": true,
"AttachStderr": true,
"ExposedPorts": {
"10200/tcp": {}
},
"Tty": true,
"OpenStdin": true,
"StdinOnce": true,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"--voice",
"en_US-lessac-medium"
],
"Image": "rhasspy/wyoming-piper",
"Volumes": null,
"WorkingDir": "/",
"Entrypoint": [
"bash",
"/run.sh"
],
"OnBuild": null,
"Labels": {}
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "a354ae0488cb1ce0a2128350f87708f8587ce66888ffca20ea0729ba7780e69c",
"SandboxKey": "/var/run/docker/netns/a354ae0488cb",
"Ports": {
"10200/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "10200"
},
{
"HostIp": "::",
"HostPort": "10200"
}
]
},
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "790a9b8e1a61173a9c3b05323019ce6be08a2733cb1640a8411b059010dc59b5",
"Gateway": "172.17.0.1",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"MacAddress": "02:",
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"MacAddress": "02",
"DriverOpts": null,
"NetworkID": "f1ea7b00d3270239d95954d07d6d563fd382a88fb891d0a786bf74c358b56c34",
"EndpointID": "790a9b8e1a61173a9c3b05323019ce6be08a2733cb1640a8411b059010dc59b5",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"DNSNames": null
}
}
}
}
]
Next, try getting transcriptions from speech samples in the sample_dataset directory. For English audio samples, use the default Watson Speech to Text model, which is configured as en-US_Multimedia in env_config.json:
curl "http://localhost:10200“
For French audio samples, specify the model fr-FR_Multimedia:
@raspberrypi:/data/homeassistant202405/tts# ls -l
total 80
-rw-r--r-- 1 root root 56315 Nov 10 17:59 3caeb43ede21da438bb30ea4128096d7132655ae_en-us_c5ee8c087b_tts.piper.mp3
-rw-r--r-- 1 root root 8135 Nov 10 17:57 b802f384302cb24fbab0a44997e820bf2e8507bb_en-us_c5ee8c087b_tts.piper.mp3
-rw-r--r-- 1 root root 12545 Nov 10 17:58 f44761887c59dbbd377ca5960da9e15f3b7df652_en-us_c5ee8c087b_tts.piper.mp3
In both cases, transcriptions (in JSON format) are returned as standard output of the curl commands.
Summary
In this tutorial, you learned how to run a single-container speech-to-text service on Docker.
Take a look at more Embeddable AI content on IBM Developer, and learn how you can embed AI into your products to differentiate your solution.
Useful links
Run a single-container speech-to-text service on Docker
https://developer.ibm.com/tutorials/run-a-single-container-speech-to-text-service-on-docker/
Comments
Comments are closed