Prerequisites
Create a Fish Audio account
Sign up for a free Fish Audio account to get started with our API.
Go to fish.audio/auth/signup
Fill in your details to create an account, complete steps to verify your account.
Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
Log in to your Fish Audio Dashboard
Navigate to the API Keys section
Click “Create New Key” and give it a descriptive name, set a expiration if desired
Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.
Overview
Voice cloning allows you to generate speech that matches a specific voice using reference audio. Fish Audio supports two approaches:
Using pre-trained voice models (reference_id)
Providing reference audio directly in your request
Use reference_id when you’ll reuse a voice multiple times - it’s faster and more efficient. Use references for one-off voice cloning or testing different voices without creating models.
Using Reference Audio
Clone a voice by providing reference audio directly:
import { FishAudioClient } from "fish-audio" ;
import type { TTSRequest , ReferenceAudio } from "fish-audio" ;
import { readFile } from "fs/promises" ;
const fishAudio = new FishAudioClient ();
const audioBuffer = await readFile ( "voice_sample.wav" );
const referenceFile = new File ([ audioBuffer ], "voice_sample.wav" );
const referenceAudio : ReferenceAudio = {
audio: referenceFile ,
text: "Text spoken in the reference audio"
};
const request : TTSRequest = {
text: "Hello, world!" ,
references: [ referenceAudio ]
};
const audio = await client . textToSpeech . convert ( request );
Multiple References
Improve voice quality by providing multiple reference samples:
import type { TTSRequest , ReferenceAudio } from "fish-audio" ;
import { readFile } from "fs/promises" ;
const references = [] as ReferenceAudio [];
for ( const i of [ 0 , 1 , 2 ]) {
const buf = await readFile ( `sample_ ${ i } .wav` );
references . push ({ audio: new File ([ buf ], `sample_ ${ i } .wav` ), text: `Text from sample ${ i } ` });
}
const request : TTSRequest = {
text: "Better voice quality with multiple references" ,
references ,
};
Creating Voice Models
For repeated use, create a persistent voice model:
import { FishAudioClient } from "fish-audio" ;
import { createReadStream } from "fs" ;
const fishAudio = new FishAudioClient ();
// Create a voice model from samples
const response = await fishAudio . voices . ivc . create ({
title: "My Custom Voice" ,
voices: [
createReadStream ( "voice_0.wav" ),
createReadStream ( "voice_1.wav" ),
createReadStream ( "voice_2.wav" ),
],
cover_image: createReadStream ( "cover.png" ),
});
console . log ( "Created model:" , response . _id );
// Use the model
const audio = await fishAudio . textToSpeech . convert ({
text: "Using my saved voice model" ,
reference_id: response . _id ,
});
Best Practices
Audio Quality
For best results, reference audio should:
Be 10-30 seconds long per sample
Have clear speech without background noise
Match the language you’ll generate
Include varied intonation and emotion
Sample Text
The text parameter in ReferenceAudio should:
Match exactly what’s spoken in the audio
Include punctuation for proper prosody
Be in the same language as generation
Pre-upload models for frequently used voices
Use 2-3 reference samples for optimal quality
Keep samples under 30 seconds each
Normalize audio levels before uploading
Supported formats for reference audio:
WAV (recommended)
MP3
M4A
Other common audio formats
Sample rates:
16kHz minimum
44.1kHz recommended
Mono or stereo (converted to mono)
Example: Voice Bank
Build a library of cloned voices:
import { FishAudioClient } from "fish-audio" ;
const fishAudio = new FishAudioClient ();
async function createVoiceBank () {
const voiceBank : Record < string , string > = {};
const models = await fishAudio . voices . search ();
for ( const m of models . items ?? []) voiceBank [ m . title ] = m . _id as string ;
return voiceBank ;
}
async function generateWithVoice ( text : string , voiceName : string ) {
const bank = await createVoiceBank ();
const modelId = bank [ voiceName ];
if ( ! modelId ) throw new Error ( `Voice ' ${ voiceName } ' not found` );
return fishAudio . textToSpeech . convert ({ text , reference_id: modelId });
}
Combining with Emotions
Add emotions to cloned voices:
// With a saved model
await fishAudio . textToSpeech . convert ({
text: "(happy) This is exciting news! (calm) Let me explain the details." ,
reference_id: "your_model_id" ,
});
// Or with direct references
await fishAudio . textToSpeech . convert ({
text: "(excited) Amazing discovery!" ,
references: [ referenceAudio ],
});
Error Handling
Common issues and solutions:
try {
await fishAudio . textToSpeech . convert ({ text: "Test speech" , references: [ referenceAudio ] });
} catch ( e : any ) {
const msg = String ( e ?. message || e );
if ( msg . includes ( "Invalid audio format" )) console . error ( "Check audio format - use WAV or MP3" );
else if ( msg . includes ( "Audio too short" )) console . error ( "Reference audio should be at least 10 seconds" );
else throw e ;
}