Real-time Voice Streaming

Overview

Real-time streaming lets you generate speech as you type or speak, perfect for chatbots, virtual assistants, and live applications.

When to Use Streaming

Perfect for:

Live chat applications
Virtual assistants
Interactive storytelling
Real-time translations
Gaming dialogue

Not ideal for:

Pre-recorded content
Batch processing

Getting Started

Web Playground

Try real-time streaming instantly:

Visit fish.audio
Enable “Streaming Mode”
Start typing and hear voice generation in real-time

Using the SDK

Stream text as it’s being written:

Python
JavaScript

from fishaudio import FishAudio

# Initialize client
client = FishAudio(api_key="your_api_key")

# Stream text word by word
def stream_text():
    text = "Hello, this is being generated in real time"
    for word in text.split():
        yield word + " "

# Generate speech as text streams
audio_stream = client.tts.stream_websocket(
    stream_text(),
    reference_id="your_voice_model_id",
    temperature=0.7,  # Controls variation
    top_p=0.7,  # Controls diversity
    latency="balanced"
)

with open("output.mp3", "wb") as f:
    for audio_chunk in audio_stream:
        f.write(audio_chunk)

Configuration Options

Speed vs Quality

Latency Modes:

Normal: Best quality, ~500ms latency
Balanced: Good quality, ~300ms latency

Python
JavaScript

# Use latency parameter with stream_websocket
audio_stream = client.tts.stream_websocket(
    text_chunks(),
    reference_id="model_id",
    latency="balanced"  # For faster response
)

Voice Control

Temperature (0.1 - 1.0):

Lower: More consistent, predictable
Higher: More varied, expressive

Top-p (0.1 - 1.0):

Lower: More focused
Higher: More diverse

Real-time Applications

Chatbot Integration

Stream responses as they’re generated:

Python
JavaScript

def chatbot_response(user_input):
    # Get AI response (streaming)
    ai_text = get_ai_response(user_input)

    # Convert to speech in real-time
    audio_stream = client.tts.stream_websocket(ai_text)
    for audio_chunk in audio_stream:
        play_audio(audio_chunk)

Live Translation

Translate and speak simultaneously:

Python
JavaScript

def live_translate(source_audio):
    # Transcribe source audio
    text = transcribe(source_audio)
    
    # Translate text
    translated = translate(text, target_language)
    
    # Stream translated speech
    for chunk in stream_text(translated):
        generate_speech(chunk)

Best Practices

Text Buffering

Do:

Send complete words with spaces
Use punctuation for natural pauses
Buffer 5-10 words for smoothness

Don’t:

Send individual characters
Forget spaces between words
Send huge chunks at once

Connection Management

Keep connections alive for multiple generations
Handle disconnections gracefully
Implement retry logic for reliability

Audio Playback

For smooth playback:

Buffer 2-3 audio chunks
Use cross-fading between chunks
Handle network delays gracefully

Common Use Cases

Interactive Story

Python
JavaScript

def interactive_story():
    story_parts = [
        "Once upon a time,",
        "in a land far away,",
        "there lived a brave knight..."
    ]
    
    for part in story_parts:
        # Generate and play each part
        stream_speech(part)
        # Wait for user input
        user_choice = get_user_input()
        # Continue based on choice

Virtual Assistant

Python
JavaScript

def virtual_assistant():
    while True:
        # Listen for wake word
        if detect_wake_word():
            # Start streaming response
            response = process_command()
            stream_speech(response)

Live Commentary

Python
JavaScript

def live_commentary(event_stream):
    for event in event_stream:
        # Generate commentary
        commentary = generate_commentary(event)
        # Stream immediately
        stream_speech(commentary)

Troubleshooting

Audio Gaps

Problem: Gaps between audio chunks
Solution:

Increase buffer size
Use balanced latency mode
Check network connection

Delayed Response

Problem: Long wait before audio starts
Solution:

Use balanced latency mode
Send initial text immediately
Reduce chunk size

Choppy Playback

Problem: Audio cuts in and out
Solution:

Buffer more chunks before playing
Check network stability
Use consistent chunk sizes

Advanced Features

Dynamic Voice Switching

Change voices mid-stream:

Python
JavaScript

# Start with one voice
def text1():
    yield "Hello from voice one."

audio1 = client.tts.stream_websocket(text1(), reference_id="voice1")
for chunk in audio1:
    play_audio(chunk)

# Switch to another
def text2():
    yield "And now voice two!"

audio2 = client.tts.stream_websocket(text2(), reference_id="voice2")
for chunk in audio2:
    play_audio(chunk)

Emotion Injection

Add emotions dynamically:

Python
JavaScript

def emotional_speech(text, emotion):
    emotional_text = f"({emotion}) {text}"
    stream_speech(emotional_text)

Speed Control

Adjust speaking speed:

Python
JavaScript

from fishaudio.types import Prosody

# Use speed and volume with stream_websocket
audio_stream = client.tts.stream_websocket(
    text_chunks(),
    speed=1.5  # 1.5x speed
)
# Note: For full prosody control including volume, use TTSConfig

Performance Tips

Pre-load voices for instant start
Use connection pooling for multiple streams
Monitor latency and adjust settings
Cache common phrases for instant playback

Get Support

Need help with streaming?

Discord Community: Join our Discord
Email Support: support@fish.audio
Status Page: status.fish.audio

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

Real-time Voice Streaming

Overview

When to Use Streaming

Getting Started

Web Playground

Using the SDK

Configuration Options

Speed vs Quality

Voice Control

Real-time Applications

Chatbot Integration

Live Translation

Best Practices

Text Buffering

Connection Management

Audio Playback

Common Use Cases

Interactive Story

Virtual Assistant

Live Commentary

Troubleshooting

Audio Gaps

Delayed Response

Choppy Playback

Advanced Features

Dynamic Voice Switching

Emotion Injection

Speed Control

Performance Tips

Get Support

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

​Overview

​When to Use Streaming

​Getting Started

​Web Playground

​Using the SDK

​Configuration Options

​Speed vs Quality

​Voice Control

​Real-time Applications

​Chatbot Integration

​Live Translation

​Best Practices

​Text Buffering

​Connection Management

​Audio Playback

​Common Use Cases

​Interactive Story

​Virtual Assistant

​Live Commentary

​Troubleshooting

​Audio Gaps

​Delayed Response

​Choppy Playback

​Advanced Features

​Dynamic Voice Switching

​Emotion Injection

​Speed Control

​Performance Tips

​Get Support

Overview

When to Use Streaming

Getting Started

Web Playground

Using the SDK

Configuration Options

Speed vs Quality

Voice Control

Real-time Applications

Chatbot Integration

Live Translation

Best Practices

Text Buffering

Connection Management

Audio Playback

Common Use Cases

Interactive Story

Virtual Assistant

Live Commentary

Troubleshooting

Audio Gaps

Delayed Response

Choppy Playback

Advanced Features

Dynamic Voice Switching

Emotion Injection

Speed Control

Performance Tips

Get Support