OpenClaw, previously known as ClawBot and MoltBot, is a personal AI agent that you run on your own infrastructure. You control OpenClaw via a chat app of your choice, such as WhatsApp, and you can also use speech-to-text (STT) and/or text-to-speech (TTS). This means you don’t have to type messages—you can speak to OpenClaw instead.
In this guide, we explain how to use text-to-speech with OpenClaw on Ubuntu/Debian, which options are available, and the main trade-offs and benefits of each choice.
- Before you start this guide, make sure you have a VPS/computer/laptop with OpenClaw, and that you’ve completed the onboarding.
- Are you using a local model and want the fastest performance and highest accuracy? Then we recommend 4 CPU cores, but even with 2 CPU cores, STT performs well.
Which TTS options does OpenClaw offer?
OpenClaw offers a number of TTS options:
-
The built-in TTS tool (recommended): Supports OpenAI, ElevenLabs and Edge (free).
-
SAG: A skill connected to ElevenLabs. You’ll need an ElevenLabs API key for this option.
-
sherpa-onnx-tts: Local TTS that requires some additional command-line configuration. The quality is comparable to
- Unofficial: espeak-ng: The fastest option, running locally on your VPS. It sounds much more robotic and is configured via the command line.
TTS speed vs naturalness
| Option | Speed | Naturalness |
|---|---|---|
OpenAI gpt-4o-mini-tts
|
⭐⭐⭐⭐ | ⭐⭐⭐⭐½ |
OpenAI tts-1
|
⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
OpenAI tts-1-hd
|
⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Edge (local) | ⭐⭐⭐⭐½ | ⭐⭐⭐ |
| ElevenLabs | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| sherpa-onnx-tts | ⭐⭐⭐⭐⭐ | ⭐⭐⭐→⭐⭐⭐⭐ (performance vs accuracy) |
| espeak-ng | ⭐⭐⭐⭐⭐ (very fast, even on CPU) | ⭐→⭐⭐ (if you like robotic voices) |
The built-in TTS tool
OpenClaw has a built-in TTS tool that’s easy to work with, because you can manage it in several places:
- The chat in the web dashboard
- The OpenClaw command-line TUI (
openclaw tui) - A communication channel such as WhatsApp. Note that when the tool is enabled, replies via a communication channel will be slower, simply because every response is also converted to speech via TTS.
The TTS tool does not interfere with Sherpa-onnx-tts or espeak-ng. For example, if OpenClaw sends an audio message back via one of these tools, OpenClaw recognises this and the TTS tool is not triggered as well.
Adding API key(s)
First add your API key(s) via the command line as follows (not in a chat conversation):
echo OPENAI_API_KEY=sk-proj-......... >> ~/.openclaw/.env
echo ELEVENLABS_API_KEY=sk_.......... >> ~/.openclaw/.env
Simply start a conversation with OpenClaw via one of the options above and use the commands below to manage the TTS tool.
Checking TTS status
/tts statusTurning TTS on
/tts onTurning TTS off
/tts offChanging the TTS provider
Use one of: openai, elevenlabs or edge:
/tts provider openai
Configuring SAG TTS
The SAG TTS skill is connected to ElevenLabs. ElevenLabs produces the most natural-sounding voices for TTS.
Whether OpenClaw uses the SAG skill depends on whether the agent thinks it should use that skill. Whether it reaches that conclusion depends, among other things, on the quality of the LLM used and the instructions you give OpenClaw (editable in ~/.openclaw/workspace/TOOLS.md and SOUL.md). Results are mixed, and if you prefer speed and reliability, we recommend the TTS tool.
Step 1
Enable SAG by navigating in the OpenClaw dashboard to ‘Skills’ (1), enter your ElevenLabs API-key and save it (2) and clicking on ‘Enable’ (3).

Stap 2
OpenClaw saves the API key out-of-the-box in openclaw.json. For security reasons we recommend changing this in openclaw.json:
nano ~/.openclaw/openclaw.jsonChange the ‘Skills’ section so the SAG part looks as follows:
"skills": {
"install": {
"nodeManager": "npm"
},
"entries": {
"sag": {
"enabled": true,
"apiKey": "${SAG_API_KEY}"
}
}
},Add your API key to the .env file (replace sk_etc with your own API-key):
echo SAG_API_KEY=sk_3c34083<redacted>b39d5a91e6c7d3 >> ~/.openclaw/.env
Sherpa-onnx-tts
How well Sherpa-onnx-tts works depends heavily on the LLM you choose (the more expensive models tend to perform better). If you use this option, you’ll probably need to add an instruction in ~/.openclaw/workspace/SOUL.md to use this skill, and not to call it more than once per run.
Step 1
OpenClaw has a skill called sherpa-onnx-tts that runs locally and doesn’t require a cloud TTS service.
First download the sherpa-onnx runtime:
mkdir -p ~/.openclaw/tools/sherpa-onnx-tts/runtime
cd ~/.openclaw/tools/sherpa-onnx-tts/runtime
curl -L -o sherpa-onnx-runtime.tar.bz2 \
https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.12.24/sherpa-onnx-v1.12.24-linux-x64-shared.tar.bz2
tar -xjf sherpa-onnx-runtime.tar.bz2 --strip-components=1The latest version at the time of writing is 1.12.24. You can find an overview of available versions at https://github.com/k2-fsa/sherpa-onnx/releases/.
Step 2
Download a sherpa-onnx model. In this example, we download lessac-medium (US English). Optionally replace lessac-medium with lessac-low (faster) or lessac-high (better quality). You can find a full overview of available models at https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models.
mkdir -p ~/.openclaw/tools/sherpa-onnx-tts/models
cd ~/.openclaw/tools/sherpa-onnx-tts/models
curl -L -o vits-piper-en_US-lessac-medium.tar.bz2 \
https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-lessac-medium.tar.bz2
tar -xjf vits-piper-en_US-lessac-medium.tar.bz2
rm vits-piper-en_US-lessac-medium.tar.bz2
Step 3
Open openclaw.json to reference the sherpa-onnx runtime and the model you’ve chosen:
nano ~/.openclaw/openclaw.json skills: {
entries: {
"sherpa-onnx-tts": {
"enabled": true
}
}
}You’ll often already have some entries here. The full section might then look like this:
"skills": {
"install": {
"nodeManager": "npm"
},
"entries": {
"sherpa-onx-tts": {
"enabled": true
},
"openai-whisper": {
"enabled": false
},
"openai-whisper-api": {
"enabled": false
}
}
},
Espeak-ng (experimental)
It’s important that the LLM model you’re using supports ‘tool-calling’ if you use espeak-ng. In particular, some self-hosted models such as gpt-oss:20b perform poorly here. So check the documentation for the relevant LLM before you implement this option.
Step 1
Install espeak-ng and the FFmpeg encoder if you don’t already have them.
Step 2
Create a ‘fast TTS’ script that the OpenClaw skill can use later. For safety, text is converted to audio via environment variables rather than directly via shell arguments.
Skills in the ~/.openclaw/skills/ folder are available to all agents on your server.
---
name: fast-tts
description: Ultra-fast local TTS (espeak-ng + ffmpeg) for OpenClaw chat channels. Use when you want audio-only replies (prefer no written text) in WhatsApp/Telegram. Includes a minimal plugin skeleton for OpenClaw voice-call telephony TTS with local espeak-ng.
metadata: {"openclaw":{"requires":{"bins":["espeak-ng","ffmpeg"]}}}
---
When the user wants an audio-only reply in chat channels:
1) Put the full reply text into env var `TTS_TEXT` (do NOT put the text on the shell command line).
2) Call the exec tool with:
- command: "{baseDir}/bin/fast-tts"
- env: { "TTS_TEXT": "<your reply text>", "TTS_CHANNEL": "<channel>" }
- Set `TTS_CHANNEL` from message context. Supported values: `whatsapp`, `telegram`. Defaults to `whatsapp`.
3) The command prints a single `MEDIA:` line.
4) Respond with exactly that `MEDIA:` line and nothing else.
Notes:
- The script emits OGG/Opus at 48kHz mono and appends `[[audio_as_voice]]` automatically for Telegram voice bubbles.
- Optional env vars: `TTS_SPEED` (default `185`), `TTS_VOICE` (espeak voice id), `OUT_DIR` (output directory).
- Use `assets/openclaw-espeak-telephony-plugin/` as the minimal starting point for local `espeak-ng` telephony TTS in the OpenClaw voice-call path.
- Integration notes for the voice-call fork are in `references/voice-call-provider-skeleton.md`.
- Config guidance for audio-only chat behavior is in `references/openclaw-config-for-audio-only.md`.Restart your OpenClaw gateway so that skills are reloaded:
openclaw gateway restart
Potential bug
There is an active report stating that MEDIA: lines are sometimes displayed as plain text (without an attachment) with certain model/provider combinations. A temporary workaround is to send audio via the message tool instead. If you run into this issue, try updating OpenClaw with openclaw update.