agents.components.speechtotext

agents.components.speechtotext#

Module Contents#

Classes#

SpeechToText

This component takes in audio input and outputs a text representation of the audio using Speech-to-Text models (e.g. Whisper).

API#

class agents.components.speechtotext.SpeechToText(*, inputs: list[agents.ros.Topic], outputs: list[agents.ros.Topic], model_client: agents.clients.model_base.ModelClient, config: Optional[agents.config.SpeechToTextConfig] = None, trigger: Union[agents.ros.Topic, list[agents.ros.Topic], float], callback_group=None, component_name: str = 'speechtotext_component', **kwargs)#

Bases: agents.components.model_component.ModelComponent

This component takes in audio input and outputs a text representation of the audio using Speech-to-Text models (e.g. Whisper).

Parameters:
  • inputs (list[Topic]) – The input topics for the STT. This should be a list of Topic objects, limited to Audio type.

  • outputs (list[Topic]) – The output topics for the STT. This should be a list of Topic objects, String type is handled automatically.

  • model_client (ModelClient) – The model client for the STT. This should be an instance of ModelClient.

  • config (Optional[SpeechToTextConfig]) – The configuration for the STT. This should be an instance of SpeechToTextConfig. If not provided, defaults to SpeechToTextConfig().

  • trigger (Union[Topic, list[Topic], float]) – The trigger value or topic for the STT. This can be a single Topic object, a list of Topic objects.

  • callback_group (str) – An optional callback group for the STT. If provided, this should be a string. Otherwise, it defaults to None.

  • component_name (str) – The name of the STT component. This should be a string and defaults to “speechtotext_component”.

Example usage:

audio_topic = Topic(name="audio", msg_type="Audio")
text_topic = Topic(name="text", msg_type="String")
config = SpeechToTextConfig(enable_vad=True)
model = Whisper(name="whisper")
model_client = ModelClient(model=model)
stt_component = SpeechToText(
    inputs=[audio_topic],
    outputs=[text_topic],
    model_client=model_client,
    config=config,
    component_name='stt_component'
)