Shortcuts

TextToSpeech

class agentlego.tools.TextToSpeech(model='tts_models/multilingual/multi-dataset/xtts_v2', speaker_embeddings=SPEAKER_EMBEDDING, device='cuda', toolmeta=None)[源代码]

A tool to convert input text to speech audio.

参数:
  • model (str) – The model name used to inference. Which can be found in https://github.com/coqui-ai/TTSHuggingFace . Defaults to tts_models/multilingual/multi-dataset/xtts_v2.

  • speaker_embeddings (str | dict) – The speaker embedding of the TTS model. Defaults to a default embedding.

  • device (str) – The device to load the model. Defaults to ‘cuda’.

  • toolmeta (None | dict | ToolMeta) – The additional info of the tool. Defaults to None.

默认工具信息

  • 名称: TextToSpeech

  • 描述: The tool can speak the input text into audio. The language code should be one of ‘zh-cn’ (Chinese), ‘en’ (English), ‘es’ (Spanish), ‘fr’ (French), ‘de’ (German), ‘it’ (Italian), ‘tr’ (Turkish), ‘ru’ (Russian), ‘ar’ (Arabic), ‘ja’ (Japanese), ‘ko’ (Korean).

  • 输入:

    • text (str)

    • lang (str): The language code of text.

  • 输出:

    • AudioIO

Examples

Use the tool directly (without agent)

from agentlego.apis import load_tool

# load tool
tool = load_tool('TextToSpeech', device='cuda')

# apply tool
audio = tool('Hello, this is a text to audio demo.')
print(audio)

With Lagent

from lagent import ReAct, GPTAPI, ActionExecutor
from agentlego.apis import load_tool

# load tools and build agent
# please set `OPENAI_API_KEY` in your environment variable.
tool = load_tool('TextToSpeech', device='cuda').to_lagent()
agent = ReAct(GPTAPI(temperature=0.), action_executor=ActionExecutor([tool]))

# agent running with the tool.
ret = agent.chat(f'Please introduce the highest mountain and speak out.')
for step in ret.inner_steps[1:]:
    print('------')
    print(step['content'])

Set up

Before using the tool, please confirm you have installed the related dependencies by the below commands.

pip install TTS, langid

Reference

This tool uses a XTTS-v2 model in default settings. See the repo details.