Skip to main content

Overview

Real-time streaming lets you generate speech as you type or speak, perfect for chatbots, virtual assistants, and live applications.

When to Use Streaming

Perfect for:
  • Live chat applications
  • Virtual assistants
  • Interactive storytelling
  • Real-time translations
  • Gaming dialogue
Not ideal for:
  • Pre-recorded content
  • Batch processing

Getting Started

Web Playground

Try real-time streaming instantly:
  1. Visit fish.audio
  2. Enable “Streaming Mode”
  3. Start typing and hear voice generation in real-time

Using the SDK

Stream text as it’s being written:
  • Python
  • JavaScript
from fishaudio import FishAudio

# Initialize client
client = FishAudio(api_key="your_api_key")

# Stream text word by word
def stream_text():
    text = "Hello, this is being generated in real time"
    for word in text.split():
        yield word + " "

# Generate speech as text streams
audio_stream = client.tts.stream_websocket(
    stream_text(),
    reference_id="your_voice_model_id",
    temperature=0.7,  # Controls variation
    top_p=0.7,  # Controls diversity
    latency="balanced"
)

with open("output.mp3", "wb") as f:
    for audio_chunk in audio_stream:
        f.write(audio_chunk)

Configuration Options

Speed vs Quality

Latency Modes:
  • Normal: Best quality, ~500ms latency
  • Balanced: Good quality, ~300ms latency
  • Python
  • JavaScript
# Use latency parameter with stream_websocket
audio_stream = client.tts.stream_websocket(
    text_chunks(),
    reference_id="model_id",
    latency="balanced"  # For faster response
)

Voice Control

Temperature (0.1 - 1.0):
  • Lower: More consistent, predictable
  • Higher: More varied, expressive
Top-p (0.1 - 1.0):
  • Lower: More focused
  • Higher: More diverse

Real-time Applications

Chatbot Integration

Stream responses as they’re generated:
  • Python
  • JavaScript
def chatbot_response(user_input):
    # Get AI response (streaming)
    ai_text = get_ai_response(user_input)

    # Convert to speech in real-time
    audio_stream = client.tts.stream_websocket(ai_text)
    for audio_chunk in audio_stream:
        play_audio(audio_chunk)

Live Translation

Translate and speak simultaneously:
  • Python
  • JavaScript
def live_translate(source_audio):
    # Transcribe source audio
    text = transcribe(source_audio)
    
    # Translate text
    translated = translate(text, target_language)
    
    # Stream translated speech
    for chunk in stream_text(translated):
        generate_speech(chunk)

Best Practices

Text Buffering

Do:
  • Send complete words with spaces
  • Use punctuation for natural pauses
  • Buffer 5-10 words for smoothness
Don’t:
  • Send individual characters
  • Forget spaces between words
  • Send huge chunks at once

Connection Management

  1. Keep connections alive for multiple generations
  2. Handle disconnections gracefully
  3. Implement retry logic for reliability

Audio Playback

For smooth playback:
  • Buffer 2-3 audio chunks
  • Use cross-fading between chunks
  • Handle network delays gracefully

Common Use Cases

Interactive Story

  • Python
  • JavaScript
def interactive_story():
    story_parts = [
        "Once upon a time,",
        "in a land far away,",
        "there lived a brave knight..."
    ]
    
    for part in story_parts:
        # Generate and play each part
        stream_speech(part)
        # Wait for user input
        user_choice = get_user_input()
        # Continue based on choice

Virtual Assistant

  • Python
  • JavaScript
def virtual_assistant():
    while True:
        # Listen for wake word
        if detect_wake_word():
            # Start streaming response
            response = process_command()
            stream_speech(response)

Live Commentary

  • Python
  • JavaScript
def live_commentary(event_stream):
    for event in event_stream:
        # Generate commentary
        commentary = generate_commentary(event)
        # Stream immediately
        stream_speech(commentary)

Troubleshooting

Audio Gaps

Problem: Gaps between audio chunks
Solution:
  • Increase buffer size
  • Use balanced latency mode
  • Check network connection

Delayed Response

Problem: Long wait before audio starts
Solution:
  • Use balanced latency mode
  • Send initial text immediately
  • Reduce chunk size

Choppy Playback

Problem: Audio cuts in and out
Solution:
  • Buffer more chunks before playing
  • Check network stability
  • Use consistent chunk sizes

Advanced Features

Dynamic Voice Switching

Change voices mid-stream:
  • Python
  • JavaScript
# Start with one voice
def text1():
    yield "Hello from voice one."

audio1 = client.tts.stream_websocket(text1(), reference_id="voice1")
for chunk in audio1:
    play_audio(chunk)

# Switch to another
def text2():
    yield "And now voice two!"

audio2 = client.tts.stream_websocket(text2(), reference_id="voice2")
for chunk in audio2:
    play_audio(chunk)

Emotion Injection

Add emotions dynamically:
  • Python
  • JavaScript
def emotional_speech(text, emotion):
    emotional_text = f"({emotion}) {text}"
    stream_speech(emotional_text)

Speed Control

Adjust speaking speed:
  • Python
  • JavaScript
from fishaudio.types import Prosody

# Use speed and volume with stream_websocket
audio_stream = client.tts.stream_websocket(
    text_chunks(),
    speed=1.5  # 1.5x speed
)
# Note: For full prosody control including volume, use TTSConfig

Performance Tips

  1. Pre-load voices for instant start
  2. Use connection pooling for multiple streams
  3. Monitor latency and adjust settings
  4. Cache common phrases for instant playback

Get Support

Need help with streaming?