Text to Speech

Overview

Transform any text into natural, expressive speech using Fish Audio’s advanced TTS models. Choose from pre-made voices or use your own cloned voices. Discover the world’s best cloned voices models on our Discovery page.

Quick Start

Web Interface

The easiest way to generate speech:

Visit Playground

Go to fish.audio and log in

Enter Your Text

Type or paste the text you want to convert

Choose a Voice

Select from available voices or use your own

Generate

Click “Generate” and download your audio

Using the SDK

Python
JavaScript

Install the SDK

pip install fish-audio-sdk

Basic Usage

Generate speech with just a few lines of code:

from fishaudio import FishAudio
from fishaudio.utils import save

# Initialize client
client = FishAudio(api_key="your_api_key_here")

# Generate speech
audio = client.tts.convert(
    text="Hello, world!",
    reference_id="your_voice_model_id"
)
save(audio, "output.mp3")

print("✓ Audio saved to output.mp3")

Voice Options

Using Pre-made Voices

Browse and select voices from the playground:

Python
JavaScript

# Use a voice from the playground
audio = client.tts.convert(
    text="Welcome to Fish Audio!",
    reference_id="7f92f8afb8ec43bf81429cc1c9199cb1"
)

Using Your Cloned Voice

Use voices you’ve created:

Python
JavaScript

# Use your own cloned voice
audio = client.tts.convert(
    text="This is my custom voice speaking",
    reference_id="your_model_id"
)

Using Reference Audio

Provide reference audio directly:

Python
JavaScript

from fishaudio.types import ReferenceAudio

# Use reference audio on-the-fly
with open("voice_sample.wav", "rb") as f:
    audio = client.tts.convert(
        text="Hello from reference audio",
        references=[
            ReferenceAudio(
                audio=f.read(),
                text="Sample text from the audio"
            )
        ]
    )

Model Selection

Choose the right model for your needs:

Model	Best For	Quality	Speed
s1	Latest features	Excellent	Fast
speech-1.6	Stable production	Very Good	Fast
speech-1.5	Legacy support	Good	Fastest

Specify a model in your request:

Python
JavaScript

# Using the latest model (default)
audio = client.tts.convert(text="Hello world")

Advanced Options

Audio Formats

Choose your output format:

Python
JavaScript

audio = client.tts.convert(
    text="Your text here",
    format="mp3",  # Options: "mp3", "wav", "pcm", "opus"
    mp3_bitrate=128  # For MP3: 64, 128, or 192
)

Chunk Length

Control text processing chunks:

Python
JavaScript

audio = client.tts.convert(
    text="Long text content...",
    chunk_length=200  # 100-300 characters per chunk
)

Latency Mode

Optimize for speed or quality:

Python
JavaScript

audio = client.tts.convert(
    text="Quick response needed",
    latency="balanced"  # "normal" or "balanced"
)

Balanced mode reduces latency to ~300ms but may slightly decrease stability.

Direct API Usage

For direct API calls without the SDK:

Python
JavaScript

import httpx
import ormsgpack

# Prepare request
request_data = {
    "text": "Hello, world!",
    "reference_id": "your_model_id",
    "format": "mp3"
}

# Make API call
with httpx.Client() as client:
    response = client.post(
        "https://api.fish.audio/v1/tts",
        content=ormsgpack.packb(request_data),
        headers={
            "authorization": "Bearer YOUR_API_KEY",
            "content-type": "application/msgpack",
            "model": "s1"
        }
    )
    
    # Save audio
    with open("output.mp3", "wb") as f:
        f.write(response.content)

Streaming Audio

Stream audio for real-time applications:

Python
JavaScript

# Stream audio chunks
audio_stream = client.tts.stream(
    text="Streaming this text in real-time",
    reference_id="model_id"
)

with open("stream_output.mp3", "wb") as f:
    for chunk in audio_stream:
        f.write(chunk)
        # Process chunk immediately for real-time playback

Adding Emotions

Make your speech more expressive:

Python
JavaScript

# Add emotion markers to your text
emotional_text = """
(excited) I just won the lottery!
(sad) But then I lost the ticket.
(laughing) Just kidding, I found it!
"""

audio = client.tts.convert(
    text=emotional_text,
    reference_id="model_id"
)

Available emotions:

Basic: (happy), (sad), (angry), (excited), (calm)
Tones: (shouting), (whispering), (soft tone)
Effects: (laughing), (sighing), (crying)

For more precise control over pronunciation and additional paralanguage features like pauses and breathing, see Fine-grained Control.

Best Practices

Text Preparation

Do:

Use proper punctuation for natural pauses
Add emotion markers for expression
Break long texts into paragraphs
Use consistent formatting

Don’t:

Use ALL CAPS (unless shouting)
Mix multiple languages randomly
Include special characters unnecessarily
Forget punctuation

Performance Tips

Batch Processing: Process multiple texts efficiently
Cache Models: Store frequently used model IDs
Optimize Chunk Size: Use 200 characters for best balance
Handle Errors: Implement retry logic for network issues

Quality Optimization

For best results:

Use high-quality reference audio for cloning
Choose appropriate emotion markers
Test different latency modes
Monitor API rate limits

Troubleshooting

Common Issues

No audio output:

Check API key validity
Verify model ID exists
Ensure proper audio format

Poor quality:

Use better reference audio
Try normal latency mode
Check text formatting

Slow generation:

Use balanced latency mode
Reduce chunk length
Check network connection

Code Examples

Batch Processing

Python
JavaScript

from fishaudio.utils import save

texts = [
    "First announcement",
    "Second announcement",
    "Third announcement"
]

for i, text in enumerate(texts):
    audio = client.tts.convert(
        text=text,
        reference_id="model_id"
    )
    save(audio, f"output_{i}.mp3")

Error Handling

Python
JavaScript

import time
from fishaudio.exceptions import FishAudioError

def generate_with_retry(text, max_retries=3):
    for attempt in range(max_retries):
        try:
            audio = client.tts.convert(
                text=text,
                reference_id="model_id"
            )
            return audio
        except FishAudioError as e:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise e

API Reference

Request Parameters

Parameter	Type	Description	Default
text	string	Text to convert	Required
reference_id	string	Model/voice ID	None
format	string	Audio format	”mp3”
chunk_length	integer	Characters per chunk	200
normalize	boolean	Normalize text	true
latency	string	Speed vs quality	”normal”

Response

Returns audio data in the specified format as binary stream.

Get Support

Need help with text-to-speech?

API Reference
Discord Community: Join our Discord
Email Support: support@fish.audio

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

Overview

Quick Start

Web Interface

Using the SDK

Voice Options

Using Pre-made Voices

Using Your Cloned Voice

Using Reference Audio

Model Selection

Advanced Options

Audio Formats

Chunk Length

Latency Mode

Direct API Usage

Streaming Audio

Adding Emotions

Best Practices

Text Preparation

Performance Tips

Quality Optimization

Troubleshooting

Common Issues

Code Examples

Batch Processing

Error Handling

API Reference

Request Parameters

Response

Get Support

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

​Overview

​Quick Start

​Web Interface

​Using the SDK

​Voice Options

​Using Pre-made Voices

​Using Your Cloned Voice

​Using Reference Audio

​Model Selection

​Advanced Options

​Audio Formats

​Chunk Length

​Latency Mode

​Direct API Usage

​Streaming Audio

​Adding Emotions

​Best Practices

​Text Preparation

​Performance Tips

​Quality Optimization

​Troubleshooting

​Common Issues

​Code Examples

​Batch Processing

​Error Handling

​API Reference

​Request Parameters

​Response

​Get Support

Overview

Quick Start

Web Interface

Using the SDK

Voice Options

Using Pre-made Voices

Using Your Cloned Voice

Using Reference Audio

Model Selection

Advanced Options

Audio Formats

Chunk Length

Latency Mode

Direct API Usage

Streaming Audio

Adding Emotions

Best Practices

Text Preparation

Performance Tips

Quality Optimization

Troubleshooting

Common Issues

Code Examples

Batch Processing

Error Handling

API Reference

Request Parameters

Response

Get Support