fishaudio.types.voices

Voice and model management types.

Sample Objects

class Sample(BaseModel)

A sample audio for a voice model. Attributes:

title - Title/name of the audio sample
text - Transcription of the spoken content in the sample
task_id - Unique identifier for the sample task
audio - URL or path to the audio file

Author Objects

class Author(BaseModel)

Voice model author information. Attributes:

id - Unique author identifier
nickname - Author’s display name
avatar - URL to author’s avatar image

Voice Objects

class Voice(BaseModel)

A voice model. Represents a TTS voice that can be used for synthesis. Attributes:

id - Unique voice model identifier (use as reference_id in TTS)
type - Model type. Options: “svc” (singing voice conversion), “tts” (text-to-speech)
title - Voice model title/name
description - Detailed description of the voice model
cover_image - URL to the voice model’s cover image
train_mode - Training mode used. Options: “fast”
state - Current model state (e.g., “ready”, “training”, “failed”)
tags - List of tags for categorization (e.g., [“male”, “english”, “young”])
samples - List of audio samples demonstrating the voice
created_at - Timestamp when the model was created
updated_at - Timestamp when the model was last updated
languages - List of supported language codes (e.g., [“en”, “zh”])
visibility - Model visibility. Options: “public”, “private”, “unlisted”
lock_visibility - Whether visibility setting is locked
like_count - Number of likes the model has received
mark_count - Number of bookmarks/favorites
shared_count - Number of times the model has been shared
task_count - Number of times the model has been used for generation
liked - Whether the current user has liked this model. Default: False
marked - Whether the current user has bookmarked this model. Default: False
author - Information about the voice model’s creator

fishaudio.types.tts

TTS-related types.

ReferenceAudio Objects

class ReferenceAudio(BaseModel)

Reference audio for voice cloning/style. Attributes:

audio - Audio file bytes for the reference sample
text - Transcription of what is spoken in the reference audio. Should match exactly what’s spoken and include punctuation for proper prosody.

Prosody Objects

class Prosody(BaseModel)

Speech prosody settings (speed and volume). Attributes:

speed - Speech speed multiplier. Range: 0.5-2.0. Default: 1.0.
Examples - 1.5 = 50% faster, 0.8 = 20% slower
volume - Volume adjustment in decibels. Range: -20.0 to 20.0. Default: 0.0 (no change). Positive values increase volume, negative values decrease it.

from_speed_override

@classmethod
def from_speed_override(cls,
                        speed: float,
                        base: Optional["Prosody"] = None) -> "Prosody"

Create Prosody with speed override, preserving volume from base. Arguments:

speed - Speed value to use
base - Base prosody to preserve volume from (if any)

Returns: New Prosody instance with overridden speed

TTSConfig Objects

class TTSConfig(BaseModel)

TTS generation configuration. Reusable configuration for text-to-speech requests. Create once, use multiple times. All parameters have sensible defaults. Attributes:

format - Audio output format. Options: “mp3”, “wav”, “pcm”, “opus”. Default: “mp3”
sample_rate - Audio sample rate in Hz. If None, uses format-specific default.
mp3_bitrate - MP3 bitrate in kbps. Options: 64, 128, 192. Default: 128
opus_bitrate - Opus bitrate in kbps. Options: -1000, 24, 32, 48, 64. Default: 32
normalize - Whether to normalize/clean the input text. Default: True
chunk_length - Characters per generation chunk. Range: 100-300. Default: 200. Lower values = faster initial response, higher values = better quality
latency - Generation mode. Options: “normal” (higher quality), “balanced” (faster). Default: “balanced”
reference_id - Voice model ID from fish.audio (e.g., “802e3bc2b27e49c2995d23ef70e6ac89”). Find IDs in voice URLs or via voices.list()
references - List of reference audio samples for instant voice cloning. Default: []
prosody - Speech speed and volume settings. Default: None (uses natural prosody)
top_p - Nucleus sampling parameter for token selection. Range: 0.0-1.0. Default: 0.7
temperature - Randomness in generation. Range: 0.0-1.0. Default: 0.7. Higher = more varied, lower = more consistent

TTSRequest Objects

class TTSRequest(BaseModel)

Request parameters for text-to-speech generation. This model is used internally for WebSocket streaming. For the HTTP API, parameters are passed directly to methods. Attributes:

text - Text to synthesize into speech
chunk_length - Characters per generation chunk. Range: 100-300. Default: 200
format - Audio output format. Options: “mp3”, “wav”, “pcm”, “opus”. Default: “mp3”
sample_rate - Audio sample rate in Hz. If None, uses format-specific default
mp3_bitrate - MP3 bitrate in kbps. Options: 64, 128, 192. Default: 128
opus_bitrate - Opus bitrate in kbps. Options: -1000, 24, 32, 48, 64. Default: 32
references - List of reference audio samples for voice cloning. Default: []
reference_id - Voice model ID for using a specific voice. Default: None
normalize - Whether to normalize/clean the input text. Default: True
latency - Generation mode. Options: “normal”, “balanced”. Default: “balanced”
prosody - Speech speed and volume settings. Default: None
top_p - Nucleus sampling for token selection. Range: 0.0-1.0. Default: 0.7
temperature - Randomness in generation. Range: 0.0-1.0. Default: 0.7

StartEvent Objects

class StartEvent(BaseModel)

WebSocket start event to initiate TTS streaming. Attributes:

event - Event type identifier, always “start”
request - TTS configuration for the streaming session

TextEvent Objects

class TextEvent(BaseModel)

WebSocket event to send a text chunk for synthesis. Attributes:

event - Event type identifier, always “text”
text - Text chunk to synthesize

FlushEvent Objects

class FlushEvent(BaseModel)

WebSocket event to force immediate audio generation from buffered text. Use this to ensure all buffered text is synthesized without waiting for more input. Attributes:

event - Event type identifier, always “flush”

CloseEvent Objects

class CloseEvent(BaseModel)

WebSocket event to end the streaming session. Attributes:

event - Event type identifier, always “stop”

fishaudio.types.account

Account-related types (credits, packages, etc.).

Credits Objects

class Credits(BaseModel)

User’s API credit balance. Attributes:

id - Unique credits record identifier
user_id - User identifier
credit - Current credit balance (decimal for precise accounting)
created_at - Timestamp when the credits record was created
updated_at - Timestamp when the credits were last updated
has_phone_sha256 - Whether the user has a verified phone number. Optional
has_free_credit - Whether the user has received free credits. Optional

Package Objects

class Package(BaseModel)

User’s prepaid package information. Attributes:

id - Unique package identifier
user_id - User identifier
type - Package type identifier
total - Total units in the package
balance - Remaining units in the package
created_at - Timestamp when the package was purchased
updated_at - Timestamp when the package was last updated
finished_at - Timestamp when the package was fully consumed. None if still active

fishaudio.types.asr

ASR (Automatic Speech Recognition) related types.

ASRSegment Objects

class ASRSegment(BaseModel)

A timestamped segment of transcribed text. Attributes:

text - The transcribed text for this segment
start - Segment start time in seconds
end - Segment end time in seconds

ASRResponse Objects

class ASRResponse(BaseModel)

Response from speech-to-text transcription. Attributes:

text - Complete transcription of the entire audio
duration - Total audio duration in milliseconds
segments - List of timestamped text segments. Empty if include_timestamps=False

duration

Duration in milliseconds

fishaudio.types.shared

Shared types used across the SDK.

PaginatedResponse Objects

class PaginatedResponse(BaseModel, Generic[T])

Generic paginated response. Attributes:

total - Total number of items across all pages
items - List of items on the current page

API Reference

REST API

Python SDK

JavaScript SDK

Types

fishaudio.types.voices

Sample Objects

Author Objects

Voice Objects

fishaudio.types.tts

ReferenceAudio Objects

Prosody Objects

from_speed_override

TTSConfig Objects

TTSRequest Objects

StartEvent Objects

TextEvent Objects

FlushEvent Objects

CloseEvent Objects

fishaudio.types.account

Credits Objects

Package Objects

fishaudio.types.asr

ASRSegment Objects

ASRResponse Objects

duration

fishaudio.types.shared

PaginatedResponse Objects

API Reference

REST API

Python SDK

JavaScript SDK

​fishaudio.types.voices

​Sample Objects

​Author Objects

​Voice Objects

​fishaudio.types.tts

​ReferenceAudio Objects

​Prosody Objects

​from_speed_override

​TTSConfig Objects

​TTSRequest Objects

​StartEvent Objects

​TextEvent Objects

​FlushEvent Objects

​CloseEvent Objects

​fishaudio.types.account

​Credits Objects

​Package Objects

​fishaudio.types.asr

​ASRSegment Objects

​ASRResponse Objects

​duration

​fishaudio.types.shared

​PaginatedResponse Objects

fishaudio.types.voices

Sample Objects

Author Objects

Voice Objects

fishaudio.types.tts

ReferenceAudio Objects

Prosody Objects

from_speed_override

TTSConfig Objects

TTSRequest Objects

StartEvent Objects

TextEvent Objects

FlushEvent Objects

CloseEvent Objects

fishaudio.types.account

Credits Objects

Package Objects

fishaudio.types.asr

ASRSegment Objects

ASRResponse Objects

duration

fishaudio.types.shared

PaginatedResponse Objects