Skip to main content

Overview

Transform any text into natural, expressive speech using Fish Audio’s advanced TTS models. Choose from pre-made voices or use your own cloned voices. Discover the world’s best cloned voices models on our Discovery page.

Quick Start

Web Interface

The easiest way to generate speech:
1

Visit Playground

Go to fish.audio and log in
2

Enter Your Text

Type or paste the text you want to convert
3

Choose a Voice

Select from available voices or use your own
4

Generate

Click “Generate” and download your audio

Using the SDK

  • Python
  • JavaScript
1

Install the SDK

pip install fish-audio-sdk
2

Basic Usage

Generate speech with just a few lines of code:
from fishaudio import FishAudio
from fishaudio.utils import save

# Initialize client
client = FishAudio(api_key="your_api_key_here")

# Generate speech
audio = client.tts.convert(
    text="Hello, world!",
    reference_id="your_voice_model_id"
)
save(audio, "output.mp3")

print("✓ Audio saved to output.mp3")

Voice Options

Using Pre-made Voices

Browse and select voices from the playground:
  • Python
  • JavaScript
# Use a voice from the playground
audio = client.tts.convert(
    text="Welcome to Fish Audio!",
    reference_id="7f92f8afb8ec43bf81429cc1c9199cb1"
)

Using Your Cloned Voice

Use voices you’ve created:
  • Python
  • JavaScript
# Use your own cloned voice
audio = client.tts.convert(
    text="This is my custom voice speaking",
    reference_id="your_model_id"
)

Using Reference Audio

Provide reference audio directly:
  • Python
  • JavaScript
from fishaudio.types import ReferenceAudio

# Use reference audio on-the-fly
with open("voice_sample.wav", "rb") as f:
    audio = client.tts.convert(
        text="Hello from reference audio",
        references=[
            ReferenceAudio(
                audio=f.read(),
                text="Sample text from the audio"
            )
        ]
    )

Model Selection

Choose the right model for your needs:
ModelBest ForQualitySpeed
s1Latest featuresExcellentFast
speech-1.6Stable productionVery GoodFast
speech-1.5Legacy supportGoodFastest
Specify a model in your request:
  • Python
  • JavaScript
# Using the latest model (default)
audio = client.tts.convert(text="Hello world")

Advanced Options

Audio Formats

Choose your output format:
  • Python
  • JavaScript
audio = client.tts.convert(
    text="Your text here",
    format="mp3",  # Options: "mp3", "wav", "pcm", "opus"
    mp3_bitrate=128  # For MP3: 64, 128, or 192
)

Chunk Length

Control text processing chunks:
  • Python
  • JavaScript
audio = client.tts.convert(
    text="Long text content...",
    chunk_length=200  # 100-300 characters per chunk
)

Latency Mode

Optimize for speed or quality:
  • Python
  • JavaScript
audio = client.tts.convert(
    text="Quick response needed",
    latency="balanced"  # "normal" or "balanced"
)
Balanced mode reduces latency to ~300ms but may slightly decrease stability.

Direct API Usage

For direct API calls without the SDK:
  • Python
  • JavaScript
import httpx
import ormsgpack

# Prepare request
request_data = {
    "text": "Hello, world!",
    "reference_id": "your_model_id",
    "format": "mp3"
}

# Make API call
with httpx.Client() as client:
    response = client.post(
        "https://api.fish.audio/v1/tts",
        content=ormsgpack.packb(request_data),
        headers={
            "authorization": "Bearer YOUR_API_KEY",
            "content-type": "application/msgpack",
            "model": "s1"
        }
    )
    
    # Save audio
    with open("output.mp3", "wb") as f:
        f.write(response.content)

Streaming Audio

Stream audio for real-time applications:
  • Python
  • JavaScript
# Stream audio chunks
audio_stream = client.tts.stream(
    text="Streaming this text in real-time",
    reference_id="model_id"
)

with open("stream_output.mp3", "wb") as f:
    for chunk in audio_stream:
        f.write(chunk)
        # Process chunk immediately for real-time playback

Adding Emotions

Make your speech more expressive:
  • Python
  • JavaScript
# Add emotion markers to your text
emotional_text = """
(excited) I just won the lottery!
(sad) But then I lost the ticket.
(laughing) Just kidding, I found it!
"""

audio = client.tts.convert(
    text=emotional_text,
    reference_id="model_id"
)
Available emotions:
  • Basic: (happy), (sad), (angry), (excited), (calm)
  • Tones: (shouting), (whispering), (soft tone)
  • Effects: (laughing), (sighing), (crying)
For more precise control over pronunciation and additional paralanguage features like pauses and breathing, see Fine-grained Control.

Best Practices

Text Preparation

Do:
  • Use proper punctuation for natural pauses
  • Add emotion markers for expression
  • Break long texts into paragraphs
  • Use consistent formatting
Don’t:
  • Use ALL CAPS (unless shouting)
  • Mix multiple languages randomly
  • Include special characters unnecessarily
  • Forget punctuation

Performance Tips

  1. Batch Processing: Process multiple texts efficiently
  2. Cache Models: Store frequently used model IDs
  3. Optimize Chunk Size: Use 200 characters for best balance
  4. Handle Errors: Implement retry logic for network issues

Quality Optimization

For best results:
  • Use high-quality reference audio for cloning
  • Choose appropriate emotion markers
  • Test different latency modes
  • Monitor API rate limits

Troubleshooting

Common Issues

No audio output:
  • Check API key validity
  • Verify model ID exists
  • Ensure proper audio format
Poor quality:
  • Use better reference audio
  • Try normal latency mode
  • Check text formatting
Slow generation:
  • Use balanced latency mode
  • Reduce chunk length
  • Check network connection

Code Examples

Batch Processing

  • Python
  • JavaScript
from fishaudio.utils import save

texts = [
    "First announcement",
    "Second announcement",
    "Third announcement"
]

for i, text in enumerate(texts):
    audio = client.tts.convert(
        text=text,
        reference_id="model_id"
    )
    save(audio, f"output_{i}.mp3")

Error Handling

  • Python
  • JavaScript
import time
from fishaudio.exceptions import FishAudioError

def generate_with_retry(text, max_retries=3):
    for attempt in range(max_retries):
        try:
            audio = client.tts.convert(
                text=text,
                reference_id="model_id"
            )
            return audio
        except FishAudioError as e:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise e

API Reference

Request Parameters

ParameterTypeDescriptionDefault
textstringText to convertRequired
reference_idstringModel/voice IDNone
formatstringAudio format”mp3”
chunk_lengthintegerCharacters per chunk200
normalizebooleanNormalize texttrue
latencystringSpeed vs quality”normal”

Response

Returns audio data in the specified format as binary stream.

Get Support

Need help with text-to-speech?