MiniMax Speech HD 语音生成

本文档介绍如何在 aijisu 中使用 MiniMax Speech HD 系列异步语音生成模型。

当前支持模型：

模型名称	类型	推荐场景
`minimax-speech-2.8-hd`	文本转语音	短视频口播、广告配音、数字人语音、情绪化旁白、自然口语
`minimax-speech-02-hd`	文本转语音	有声书、课程讲解、客服播报、新闻播报、长文本旁白、多语言语音

接口采用异步任务模式：

操作	方法	路径
提交语音任务	`POST`	`/v1/audio/tasks`
查询语音任务	`GET`	`/v1/audio/tasks/{task_id}`

基础请求地址示例：

https://api.xxx.xx

1. 模型简介

1.1 minimax-speech-2.8-hd

minimax-speech-2.8-hd 是更新一代高清语音生成模型，适合需要更强自然感、口语感、停顿感和情绪表现力的语音内容。

适合场景：

短视频口播
广告配音
数字人讲解
情绪化角色台词
播客开场
带笑声、叹气、停顿等自然声音细节的内容

推荐文本示例：

欢迎回来。<#0.5#> 今天我们聊一个很有意思的话题。(laughs)

1.2 minimax-speech-02-hd

minimax-speech-02-hd 是成熟稳定的高清语音生成模型，适合生产型、稳定型、长文本型语音任务。

适合场景：

有声书
课程讲解
企业培训
新闻播报
客服语音
长文本旁白
多语言语音生成

推荐文本示例：

本节课我们将学习函数的基本概念。函数可以帮助我们封装重复逻辑，提高代码复用性。

2. 鉴权方式

所有请求都需要携带 API Key。

请求头：

Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

3. 提交语音生成任务

请求地址：

POST https://api.xxx.xx/v1/audio/tasks

3.1 请求参数

参数	类型	必填	说明
`model`	string	是	模型名称，支持 `minimax-speech-2.8-hd`、`minimax-speech-02-hd`
`text`	string	是	要生成语音的文本
`input`	string	否	`text` 的别名，适配部分 OpenAI 风格请求
`voice_id`	string	否	音色 ID，例如平台提供的预设音色或克隆音色 ID
`voice`	string	否	音色名称，兼容字段
`speed`	number	否	语速，常用范围 `0.5` 到 `2.0`
`emotion`	string	否	情绪，如 `happy`、`sad`、`angry`、`fearful`、`disgusted`、`surprised`、`neutral`
`language`	string	否	语言提示，如 `Chinese`、`English`、`Japanese`、`auto`
`output_format`	string	否	输出格式，建议使用 `url`
`response_format`	string	否	响应格式，建议使用 `url`
`sample_rate`	number	否	采样率，如 `32000`、`44100`
`pronunciation_dict`	object	否	自定义发音词典
`timber_weights`	array	否	混合音色权重，高级用法
`subtitle_enable`	boolean	否	是否尝试生成字幕信息
`metadata`	object	否	自定义业务信息
`extra_body`	object	否	高级参数扩展

注意：

text 和 input 二选一即可。
推荐优先使用 text。
如果同时传入 text 和 input，两者内容必须一致。
当前接口为异步任务接口，提交任务后需要通过 task_id 查询结果。
计费按输入字符数计算，中文、英文、数字、标点、空格、换行、emoji、停顿标签、声音标签都会计入字符数。
示例中的 voice_id 仅用于演示，请替换为站内实际可用音色 ID。

3.2 最简请求示例

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "你好，欢迎使用 aijisu 语音生成服务。"
  }'

3.3 提交成功返回示例

{
  "id": "task_xxxxxxxxxxxxx",
  "task_id": "task_xxxxxxxxxxxxx",
  "object": "audio.generation.job",
  "status": "queued",
  "raw_status": "SUBMITTED",
  "progress": "0%",
  "audio_url": null,
  "result": null,
  "error": null
}

4. 查询任务结果

请求地址：

GET https://api.xxx.xx/v1/audio/tasks/{task_id}

请求示例：

curl -X GET "https://api.xxx.xx/v1/audio/tasks/task_xxxxxxxxxxxxx" \
  -H "Authorization: Bearer YOUR_API_KEY"

4.1 生成中返回示例

{
  "id": "task_xxxxxxxxxxxxx",
  "task_id": "task_xxxxxxxxxxxxx",
  "object": "audio.generation.job",
  "status": "in_progress",
  "raw_status": "IN_PROGRESS",
  "progress": "45%",
  "audio_url": null,
  "result": null,
  "error": null
}

4.2 生成完成返回示例

{
  "id": "task_xxxxxxxxxxxxx",
  "task_id": "task_xxxxxxxxxxxxx",
  "object": "audio.generation.job",
  "status": "completed",
  "raw_status": "SUCCESS",
  "progress": "100%",
  "audio_url": "https://example.com/audio.mp3",
  "result": {
    "audio_url": "https://example.com/audio.mp3",
    "outputs": [
      "https://example.com/audio.mp3"
    ],
    "audios": [
      {
        "url": "https://example.com/audio.mp3"
      }
    ]
  },
  "error": null
}

4.3 生成失败返回示例

{
  "id": "task_xxxxxxxxxxxxx",
  "task_id": "task_xxxxxxxxxxxxx",
  "object": "audio.generation.job",
  "status": "failed",
  "raw_status": "FAILURE",
  "progress": "100%",
  "audio_url": null,
  "result": null,
  "error": {
    "message": "audio task failed"
  }
}

5. 任务状态说明

status	说明
`queued`	已提交，等待处理
`in_progress`	正在生成
`processing`	处理中
`completed`	生成完成
`failed`	生成失败

建议每 2 到 5 秒查询一次任务状态，不建议高频轮询。

6. 声音标签和停顿

minimax-speech-2.8-hd 更适合使用自然声音标签和停顿标记。

写法	说明
`<#0.5#>`	停顿 0.5 秒
`<#1.0#>`	停顿 1 秒
`(laughs)`	笑声
`(sighs)`	叹气
`(coughs)`	咳嗽
`(clears throat)`	清嗓
`(gasps)`	倒吸气
`(sniffs)`	吸鼻
`(groans)`	低哼
`(yawns)`	打哈欠

示例：

你终于来了。<#0.8#> 我还以为，你已经忘了这个约定。(sighs)

7. 使用场景示例

7.1 中文短视频口播

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "今天给大家分享一个提高效率的小技巧。<#0.4#> 很简单，但真的很有用。",
    "voice_id": "Wise_Woman",
    "speed": 1.05,
    "emotion": "happy",
    "output_format": "url"
  }'

7.2 课程讲解

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-hd",
    "text": "本节课我们将学习函数的基本概念。函数可以帮助我们把重复的逻辑封装起来，提高代码的复用性。",
    "voice_id": "Wise_Woman",
    "speed": 0.95,
    "emotion": "neutral",
    "output_format": "url"
  }'

7.3 广告配音

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "全新升级，限时开启。<#0.3#> 现在下单，享受专属优惠！",
    "voice_id": "Wise_Woman",
    "speed": 1.12,
    "emotion": "happy",
    "output_format": "url"
  }'

7.4 有声书旁白

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-hd",
    "text": "夜色渐深，街边的灯一盏接一盏亮起。她站在窗前，安静地望着远处的城市。",
    "voice_id": "Wise_Woman",
    "speed": 0.88,
    "emotion": "neutral",
    "output_format": "url"
  }'

7.5 情绪化角色台词

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "你终于来了。<#0.6#> 我还以为，你已经忘了这个约定。(sighs)",
    "voice_id": "Wise_Woman",
    "speed": 0.92,
    "emotion": "sad",
    "output_format": "url"
  }'

7.6 英文播客开场

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "Hey, welcome back to the show. <#0.4#> Today we are talking about how AI is changing creative work. (laughs)",
    "voice_id": "Wise_Woman",
    "speed": 1.0,
    "emotion": "happy",
    "language": "English",
    "output_format": "url"
  }'

7.7 多语言客服问候

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-hd",
    "text": "您好，欢迎致电客户服务中心。Please hold on for a moment. 我们将尽快为您服务。",
    "voice_id": "Wise_Woman",
    "speed": 1.0,
    "language": "auto",
    "emotion": "neutral",
    "output_format": "url"
  }'

7.8 新闻播报

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-hd",
    "text": "今天的主要新闻包括：人工智能产业持续增长，多家企业发布新一代智能创作工具。",
    "voice_id": "Wise_Woman",
    "speed": 1.0,
    "emotion": "neutral",
    "output_format": "url"
  }'

7.9 数字人口播

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "大家好，我是你的 AI 助手。<#0.4#> 接下来，我会用一分钟带你了解今天的重点内容。",
    "voice_id": "Wise_Woman",
    "speed": 1.03,
    "emotion": "happy",
    "output_format": "url"
  }'

7.10 儿童故事

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "很久很久以前，森林里住着一只勇敢的小兔子。它每天最喜欢做的事情，就是去河边看星星。",
    "voice_id": "Wise_Woman",
    "speed": 0.9,
    "emotion": "happy",
    "output_format": "url"
  }'

7.11 企业培训语音

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-hd",
    "text": "欢迎参加本次企业安全培训。请大家认真阅读操作规范，并在实际工作中严格遵守。",
    "voice_id": "Wise_Woman",
    "speed": 0.96,
    "emotion": "neutral",
    "output_format": "url"
  }'

7.12 慢速冥想旁白

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "闭上眼睛。<#1.0#> 慢慢吸气。<#1.0#> 再缓缓呼出。",
    "voice_id": "Wise_Woman",
    "speed": 0.82,
    "emotion": "neutral",
    "output_format": "url"
  }'

7.13 使用 input 字段提交

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-hd",
    "input": "这是一条使用 input 字段提交的语音生成任务。",
    "voice_id": "Wise_Woman",
    "output_format": "url"
  }'

7.14 自定义发音词典

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "欢迎使用 AI极速，新一代智能创作平台。",
    "voice_id": "Wise_Woman",
    "output_format": "url",
    "pronunciation_dict": {
      "tone_list": [
        "AI极速/(A)(I)(ji2)(su4)"
      ]
    }
  }'

7.15 高采样率音频

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "这是一段用于视频后期制作的高质量配音。",
    "voice_id": "Wise_Woman",
    "sample_rate": 44100,
    "output_format": "url"
  }'

7.16 客服 IVR 菜单播报

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-hd",
    "text": "您好，欢迎致电。业务咨询请按一，订单查询请按二，人工服务请按零。",
    "voice_id": "Wise_Woman",
    "speed": 0.98,
    "emotion": "neutral",
    "output_format": "url"
  }'

7.17 产品介绍视频旁白

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "这是一款为创作者打造的智能工具。<#0.4#> 它可以帮你更快完成脚本、配音和内容生成。",
    "voice_id": "Wise_Woman",
    "speed": 1.02,
    "emotion": "happy",
    "output_format": "url"
  }'

7.18 严肃纪录片旁白

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-02-hd",
    "text": "在漫长的时间里，人类不断探索自然、理解世界，并试图找到自身与时代之间的关系。",
    "voice_id": "Wise_Woman",
    "speed": 0.9,
    "emotion": "neutral",
    "output_format": "url"
  }'

8. JavaScript 调用示例

const API_KEY = "YOUR_API_KEY";
const BASE_URL = "https://api.xxx.xx";

async function createAudioTask() {
  const response = await fetch(`${BASE_URL}/v1/audio/tasks`, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      model: "minimax-speech-2.8-hd",
      text: "你好，这是一段由 aijisu 生成的语音。",
      voice_id: "Wise_Woman",
      speed: 1,
      emotion: "neutral",
      output_format: "url"
    })
  });

  if (!response.ok) {
    throw new Error(await response.text());
  }

  return await response.json();
}

async function getAudioTask(taskId) {
  const response = await fetch(`${BASE_URL}/v1/audio/tasks/${taskId}`, {
    method: "GET",
    headers: {
      "Authorization": `Bearer ${API_KEY}`
    }
  });

  if (!response.ok) {
    throw new Error(await response.text());
  }

  return await response.json();
}

async function main() {
  const task = await createAudioTask();
  console.log("task_id:", task.task_id);

  while (true) {
    const result = await getAudioTask(task.task_id);
    console.log(result.status, result.progress);

    if (result.status === "completed") {
      console.log("audio_url:", result.audio_url);
      break;
    }

    if (result.status === "failed") {
      console.error("failed:", result.error);
      break;
    }

    await new Promise(resolve => setTimeout(resolve, 3000));
  }
}

main().catch(console.error);

9. Python 调用示例

import time
import requests

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.xxx.xx"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

payload = {
    "model": "minimax-speech-02-hd",
    "text": "你好，这是一段通过 Python 提交的语音生成任务。",
    "voice_id": "Wise_Woman",
    "speed": 1,
    "emotion": "neutral",
    "output_format": "url",
}

create_resp = requests.post(
    f"{BASE_URL}/v1/audio/tasks",
    headers=headers,
    json=payload,
)
create_resp.raise_for_status()

task = create_resp.json()
task_id = task["task_id"]

while True:
    query_resp = requests.get(
        f"{BASE_URL}/v1/audio/tasks/{task_id}",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    query_resp.raise_for_status()

    result = query_resp.json()
    print(result["status"], result.get("progress"))

    if result["status"] == "completed":
        print("audio_url:", result.get("audio_url"))
        break

    if result["status"] == "failed":
        print("failed:", result.get("error"))
        break

    time.sleep(3)

10. 计费说明

语音生成按输入字符数计费。

会计入字符数的内容包括：

中文
英文
数字
标点
空格
换行
emoji
声音标签
停顿标签

示例：

你好，世界！

字符计算：

你 好 ， 世 界 ！

共 6 个字符。

实际扣费以站内模型价格、分组倍率、套餐规则和账户余额规则为准。

11. 模型选择建议

11.1 优先使用 minimax-speech-2.8-hd

适合：

需要更自然的口语表达
需要笑声、叹气、停顿等声音细节
做短视频口播
做广告配音
做数字人口播
做情绪化角色语音

11.2 优先使用 minimax-speech-02-hd

适合：

长文本旁白
有声书
课程讲解
客服播报
新闻播报
多语言内容
更偏稳定生产的场景

12. 推荐模板

12.1 短视频口播模板

{
  "model": "minimax-speech-2.8-hd",
  "text": "今天给大家分享一个非常实用的小技巧。<#0.4#> 学会之后，你的效率会明显提升。",
  "voice_id": "Wise_Woman",
  "speed": 1.05,
  "emotion": "happy",
  "output_format": "url"
}

12.2 有声书模板

{
  "model": "minimax-speech-02-hd",
  "text": "夜色渐深，城市的喧嚣慢慢退去，只剩窗外微弱的风声。",
  "voice_id": "Wise_Woman",
  "speed": 0.88,
  "emotion": "neutral",
  "output_format": "url"
}

12.3 客服播报模板

{
  "model": "minimax-speech-02-hd",
  "text": "您好，欢迎致电客户服务中心。请稍候，我们将尽快为您接通人工服务。",
  "voice_id": "Wise_Woman",
  "speed": 1,
  "emotion": "neutral",
  "output_format": "url"
}

12.4 情绪角色模板

{
  "model": "minimax-speech-2.8-hd",
  "text": "你真的要离开吗？<#0.8#> 我以为，我们还有机会。(sighs)",
  "voice_id": "Wise_Woman",
  "speed": 0.92,
  "emotion": "sad",
  "output_format": "url"
}

12.5 英文口播模板

{
  "model": "minimax-speech-2.8-hd",
  "text": "Welcome back. <#0.4#> Today we are going to talk about how creators can use AI to work faster.",
  "voice_id": "Wise_Woman",
  "speed": 1,
  "emotion": "happy",
  "language": "English",
  "output_format": "url"
}

13. 常见问题

13.1 为什么提交后没有立刻返回音频？

因为语音生成是异步任务。提交接口只返回任务 ID，需要通过查询接口获取最终音频 URL。

13.2 `text` 和 `input` 有什么区别？

input 是 text 的别名。推荐优先使用 text。

13.3 能不能一次生成多个音频？

当前建议一次请求生成一条音频。如果需要多段音频，建议拆成多个任务分别提交。

13.4 长文本怎么处理？

建议按段落拆分成多个任务。这样更容易控制失败重试、段落顺序和后期拼接。

13.5 如何让语音更自然？

建议：

保留标点符号
适当加入停顿标签
不要把单句写得过长
根据场景调整 speed
口播、广告、数字人优先使用 minimax-speech-2.8-hd

13.6 音频 URL 可以直接播放吗？

任务完成后返回的 audio_url 通常可以直接用于播放器播放、下载或后续处理。实际可访问时长以站点存储策略为准。

13.7 请求失败时怎么排查？

常见原因：

API Key 未填写或无效
model 写错
text 或 input 为空
同时传入 text 和 input，但两者内容不一致
音色 ID 不可用
参数格式不正确
账户余额不足或无权限使用该模型

14. 完整流程示例

第一步：提交任务

curl -X POST "https://api.xxx.xx/v1/audio/tasks" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-speech-2.8-hd",
    "text": "你好，这是一条完整流程测试语音。",
    "voice_id": "Wise_Woman",
    "speed": 1,
    "emotion": "neutral",
    "output_format": "url"
  }'

{
  "id": "task_xxxxxxxxxxxxx",
  "task_id": "task_xxxxxxxxxxxxx",
  "object": "audio.generation.job",
  "status": "queued",
  "raw_status": "SUBMITTED",
  "progress": "0%",
  "audio_url": null,
  "result": null,
  "error": null
}

第二步：查询任务

curl -X GET "https://api.xxx.xx/v1/audio/tasks/task_xxxxxxxxxxxxx" \
  -H "Authorization: Bearer YOUR_API_KEY"

第三步：获取音频 URL

当 status 为 completed 时，读取：

{
  "audio_url": "https://example.com/audio.mp3"
}

即可播放或下载生成的音频。

1. 模型简介​

1.1 minimax-speech-2.8-hd​

1.2 minimax-speech-02-hd​

2. 鉴权方式​

3. 提交语音生成任务​

3.1 请求参数​

3.2 最简请求示例​

3.3 提交成功返回示例​

4. 查询任务结果​

4.1 生成中返回示例​

4.2 生成完成返回示例​

4.3 生成失败返回示例​

5. 任务状态说明​

6. 声音标签和停顿​

7. 使用场景示例​

7.1 中文短视频口播​

7.2 课程讲解​

7.3 广告配音​

7.4 有声书旁白​

7.5 情绪化角色台词​

7.6 英文播客开场​

7.7 多语言客服问候​

7.8 新闻播报​

7.9 数字人口播​

7.10 儿童故事​

7.11 企业培训语音​

7.12 慢速冥想旁白​

7.13 使用 input 字段提交​

7.14 自定义发音词典​

7.15 高采样率音频​

7.16 客服 IVR 菜单播报​

7.17 产品介绍视频旁白​

7.18 严肃纪录片旁白​

8. JavaScript 调用示例​

9. Python 调用示例​

10. 计费说明​

11. 模型选择建议​

11.1 优先使用 minimax-speech-2.8-hd​

11.2 优先使用 minimax-speech-02-hd​

12. 推荐模板​

12.1 短视频口播模板​

12.2 有声书模板​

12.3 客服播报模板​

12.4 情绪角色模板​

12.5 英文口播模板​

13. 常见问题​

13.1 为什么提交后没有立刻返回音频？​

13.2 text 和 input 有什么区别？​

13.3 能不能一次生成多个音频？​

13.4 长文本怎么处理？​

13.5 如何让语音更自然？​

13.6 音频 URL 可以直接播放吗？​

13.7 请求失败时怎么排查？​

14. 完整流程示例​

第一步：提交任务​

第二步：查询任务​

第三步：获取音频 URL​