WebSocket TTS Streaming

WSS

wss://api.fish.audio

tts

live

Messages

bearerAuth

type:http

API key authentication using Bearer token.

Get your API key from https://fish.audio/app/api-keys

Pass the token in the Authorization header: Authorization: Bearer YOUR_API_KEY

headers

type:object

model

type:string

TTS model to use for this session

Start TTS Session

type:object

Initiates a TTS streaming session with configuration.

This must be the first message sent after connecting. It contains all the configuration for voice, audio format, and generation parameters.

Send Text Chunk

type:object

Sends a chunk of text for synthesis.

You can send multiple TextEvent messages in sequence. The server will buffer and synthesize text according to the chunk_length parameter from StartEvent.

Flush Buffered Text

type:object

Forces immediate synthesis of all buffered text.

Use this when you want audio generated immediately without waiting for more text or for the buffer to fill up. Useful for ensuring low latency in interactive applications.

End TTS Session

type:object

Signals the end of the text stream.

After sending this event, the server will finish synthesizing any remaining buffered text and send a FinishEvent before closing the connection.

Audio Chunk

type:object

Contains generated audio bytes.

You will receive multiple AudioEvent messages as audio is generated. Each message contains a chunk of audio in the format you specified. Concatenate all chunks to get the complete audio.

Session Complete

type:object

Signals that the TTS session has completed.

If reason='stop', synthesis completed successfully
If reason='error', an error occurred (client should handle gracefully)

The WebSocket connection will close after this event.

The WebSocket TTS endpoint enables bidirectional streaming for low-latency text-to-speech generation with MessagePack serialization.

Speech to Text Overview

⌘I

API Reference

REST API

Python SDK

JavaScript SDK

WebSocket TTS Streaming