Speech Synthesis
Use MiniMax Speech HD models to generate speech from text. Voice management APIs can list system voices and create reusable private voice_id values through voice cloning or voice design.
When To Use It
- Generate voiceover for short videos, ads, courses, narration, or digital humans.
- Use a public voice from the voice list.
- Clone or design a private voice, then pass its
voice_idinto a speech task.
Supported Models
| Model | Type | Recommended use |
|---|---|---|
minimax-speech-2.8-hd | Text to speech | Natural short-form speech, emotional narration, ads, digital-human voiceover. |
minimax-speech-02-hd | Text to speech | Audiobooks, course narration, customer-service voice, news reading, long-form narration. |
Endpoint
POST /v1/audio/tasks
GET /v1/audio/tasks/{task_id}
GET /v1/audio/voices
POST /v1/audio/voices/clone
POST /v1/audio/voices/design
Authentication
Authorization: Bearer sk-***
Content-Type: application/json
Speech Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | minimax-speech-2.8-hd or minimax-speech-02-hd. |
text | string | Yes | Text to synthesize. |
voice_id | string | Yes | Public or private voice ID. |
speed | number | No | Speaking speed when supported by the model. |
volume | number | No | Output volume when supported. |
pitch | number | No | Pitch adjustment when supported. |
format | string | No | Output format, such as mp3 or wav, when supported. |
language | string | No | Language hint for multilingual text. |
audio_setting | object | No | Advanced audio settings passed through to the provider when supported. |
Speech Request Example
curl -X POST "{BASE_URL}/v1/audio/tasks" \
-H "Authorization: Bearer sk-***" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax-speech-2.8-hd",
"text": "Welcome back. Today we will introduce a faster way to build AI applications.",
"voice_id": "voice_xxx",
"format": "mp3",
"speed": 1
}'
Submit Response
{
"task_id": "task_xxx",
"status": "queued",
"model": "minimax-speech-2.8-hd",
"created_at": 1773980459
}
Query Task Status
curl "{BASE_URL}/v1/audio/tasks/task_xxx" \
-H "Authorization: Bearer sk-***"
{
"task_id": "task_xxx",
"status": "succeeded",
"progress": "100%",
"output": {
"audio_url": "https://example.com/speech.mp3"
},
"error": null
}
Voice Management
List available voices:
curl "{BASE_URL}/v1/audio/voices" \
-H "Authorization: Bearer sk-***"
Clone a voice from reference audio:
curl -X POST "{BASE_URL}/v1/audio/voices/clone" \
-H "Authorization: Bearer sk-***" \
-H "Content-Type: application/json" \
-d '{
"name": "brand-narrator",
"audio_url": "https://example.com/reference.wav"
}'
Design a voice from text:
curl -X POST "{BASE_URL}/v1/audio/voices/design" \
-H "Authorization: Bearer sk-***" \
-H "Content-Type: application/json" \
-d '{
"name": "warm-host",
"description": "Warm, clear, young adult narrator for product explainers"
}'
Use the returned voice_id in later speech tasks.
Billing Notes
Speech generation is billed by generated audio usage. Voice cloning and voice design may be billed separately because they create reusable private voices. Exact prices should come from the current product pricing surface.
Common Errors
- Missing
textorvoice_id. - Passing a private
voice_idthat is not visible to the current account. - Submitting very long text without splitting it into smaller tasks.
- Using an unsupported output format.
- Insufficient balance or disabled API key.