agents.config#

Module Contents#

Classes#

LLMConfig

Configuration for the Large Language Model (LLM) component.

MLLMConfig

Configuration for the Multi-Modal Large Language Model (MLLM) component.

SpeechToTextConfig

Configuration for a Speech-To-Text component.

TextToSpeechConfig

Configuration for a Text-To-Speech component.

SemanticRouterConfig

Configuration parameters for a semantic router component.

MapConfig

Configuration for a MapEncoding component.

VideoMessageMakerConfig

Configuration parameters for a video message maker component.

VisionConfig

Configuration for a detection component.

API#

class agents.config.LLMConfig#

Bases: agents.config.ModelComponentConfig

Configuration for the Large Language Model (LLM) component.

It defines various settings that control how the LLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.

Parameters:
  • enable_rag (bool) – Enables or disables Retreival Augmented Generation.

  • collection_name (Optional[str]) – The name of the vectordb collection to use for RAG.

  • distance_func (str) – The distance metric used for nearest neighbor search for RAG. Supported values are “l2”, “ip”, and “cosine”.

  • n_results (int) – The maximum number of results to return for RAG. Defaults to 1. For numbers greater than 1, results will be concatenated together in a single string.

  • chat_history (bool) – Whether to include chat history in the LLM’s prompt.

  • history_reset_phrase (str) – Phrase to reset chat history. Defaults to ‘chat reset’

  • history_size (int) – Number of user messages to keep in chat history. Defaults to 10

  • temperature (float) – Temperature used for sampling tokens during generation. Default is 0.8 and must be greater than 0.0.

  • max_new_tokens – The maximum number of new tokens to generate. Default is 100 and must be greater than 0.

Example of usage:

config = LLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")
class agents.config.MLLMConfig#

Bases: agents.config.LLMConfig

Configuration for the Multi-Modal Large Language Model (MLLM) component.

It defines various settings that control how the LLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.

Parameters:
  • enable_rag (bool) – Enables or disables Retreival Augmented Generation.

  • collection_name (Optional[str]) – The name of the vectordb collection to use for RAG.

  • distance_func (str) – The distance metric used for nearest neighbor search for RAG. Supported values are “l2”, “ip”, and “cosine”.

  • n_results (int) – The maximum number of results to return for RAG.

  • chat_history (bool) – Whether to include chat history in the MLLM’s prompt.

  • temperature (float) – Temperature used for sampling tokens during generation. Default is 0.7 and must be greater than 0.0.

  • max_new_tokens – The maximum number of new tokens to generate. Default is 100 and must be greater than 0.

Example of usage:

config = MLLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")
class agents.config.SpeechToTextConfig#

Bases: agents.config.ModelComponentConfig

Configuration for a Speech-To-Text component.

This class defines the configuration options for a Speech-To-Text component.

Parameters:
  • enable_vad (bool) – Enable Voice Activity Detection (VAD) to identify when speech is present in continuous input stream from an input audio device. Uses silero-vad model and requires onnxruntime to be installed. Defaults to False.

  • enable_wakeword (bool) – Enable Wakeword Detection to identify a specific key phrase in the audio stream, e.g ‘Hey Jarvis’. Defaults to False.

  • device_audio (int) – Device id (int) to use for audio input. Only effective if enable_vad is set to true. Defaults to 0.

  • vad_threshold (float) – Minimum threshold above which speech is considered present. Only effective if enable_vad is set to true. Defaults to 0.5 (50%).

  • wakeword_threshold (float) – Minimum threshold for detecting the wake word phrase. Only effective if enable_wakeword is set to true. Defaults to 0.6 (60%).

  • min_silence_duration_ms (int) – Minimum duration of silence in milliseconds before considering it as a speaker pause. Only effective if enable_vad is set to true. Defaults to 1000 ms.

  • speech_pad_ms (int) – Duration in milliseconds to pad silence at the start and end of detected speech regions. Only effective if enable_vad is set to true. Defaults to 30 ms.

  • speech_buffer_max_len (int) – Maximum length of the speech buffer in milliseconds. Defaults to 8000ms. Only effective if enable_vad is set to true.

  • device_vad (str) – Device type for VAD processing (‘cpu’ or ‘gpu’). Only effective if enable_vad is set to true. Defaults to ‘cpu’.

  • device_wakeword (str) – Device type for Wakeword detection (‘cpu’ or ‘gpu’). Only effective if enable_wakeword is set to true. Defaults to ‘cpu’.

  • ncpu_vad (int) – Number of CPU cores to use for VAD processing. Only effective if device_vad is ‘cpu’. Defaults to 1.

  • ncpu_wakeword (int) – Number of CPU cores to use for Wakeword detection. Only effective if device_wakeword is ‘cpu’. Defaults to 1.

  • vad_model_path (str) – File path or URL to the VAD model file. Defaults to the URL for Silero VAD ONNX model.

  • melspectrogram_model_path (str) – File path or URL to the melspectrogram model file used by the Wakeword detection. Defaults to the URL for melspectrogram ONNX model provided by openWakeWord.

  • embedding_model_path (str) – File path or URL to the audio embedding model file used by the Wakeword detection. Defaults to the URL for embedding ONNX model provided by openWakeWord, which is a reimplmentation of audio embeddings model provided by Google. License Apache-2.0.

  • wakeword_model_path (str) – File path or URL to the Wakeword model file. Defaults to the URL for pretrained ‘Hey Jarvis’ wakeword ONNX model provided by openWakeWord. To train your custom wakeword model, follow the tutorial provided by openWakeWord.

Example of usage:

config = SpeechToTextConfig(
    enable_vad=True,
    enable_wakeword=True,
    device_audio=1,
    vad_threshold=0.5,
    wakeword_threshold=0.6,
    min_silence_duration_ms=1000,
    speech_pad_ms=30,
    speech_buffer_max_len=8000,
)
class agents.config.TextToSpeechConfig#

Bases: agents.config.ModelComponentConfig

Configuration for a Text-To-Speech component.

This class defines the configuration options for a Text-To-Speech component.

Parameters:
  • play_on_device (bool) – Whether to play the audio on available audio device (default: False).

  • device – Device id (int) or name (sub-string) for playing the audio. Only effective if play_on_device is True (default: ‘default’).

  • buffer_size (int) – Size of the buffer for playing audio on device. Only effective if play_on_device is True (default: 20).

  • block_size (int) – Size of the audio block to be read for playing audio on device. Only effective if play_on_device is True (default: 1024).

  • get_bytes (bool) – Whether the model should return the speech data as bytes instead of base64 encoded string(default: False).

Example of usage:

config = TextToSpeechConfig(play_on_device=True, get_bytes=False)
class agents.config.SemanticRouterConfig#

Bases: agents.ros.BaseComponentConfig

Configuration parameters for a semantic router component.

Parameters:
  • router_name (str) – The name of the router.

  • distance_func (str) – The function used to calculate distance from route samples in vectordb. Can be one of “l2” (L2 distance), “ip” (Inner Product), or “cosine” (Cosine similarity). Default is “l2”.

  • maximum_distance (float) – The maximum distance threshold for routing. A value between 0.1 and 1.0. Defaults to 0.4

Example of usage:

config = SemanticRouterConfig(router_name="my_router")
# or
config = SemanticRouterConfig(router_name="my_router", distance_func="ip", maximum_distance=0.7)
class agents.config.MapConfig#

Bases: agents.ros.BaseComponentConfig

Configuration for a MapEncoding component.

Parameters:
  • map_name (str) – The name of the map.

  • distance_func (str) – The function used to calculate distance when retreiving information from the map collection. Can be one of “l2” (L2 distance), “ip” (Inner Product), or “cosine” (Cosine similarity). Default is “l2”.

Example of usage:

config = MapConfig(map_name="my_map", distance_func="ip")
class agents.config.VideoMessageMakerConfig#

Bases: agents.ros.BaseComponentConfig

Configuration parameters for a video message maker component.

Parameters:
  • min_video_frames (int) – The minimum number of frames in a video segment. Default is 15, assuming a 0.5 second video at 30 fps.

  • max_video_frames (int) – The maximum number of frames in a video segment. Default is 600, assuming a 20 second video at 30 fps.

  • motion_estimation_func (Optional[str]) – The function used for motion estimation. Can be one of “frame_difference” or “optical_flow”. Default is None.

  • threshold (float) – The threshold value for motion detection. A float between 0.1 and 5.0. Default is 0.3.

  • flow_kwargs – Additional keyword arguments for the optical flow algorithm. Default is a dictionary with reasonable values.

Example of usage:

config = VideoMessageMakerConfig()
# or
config = VideoMessageMakerConfig(min_video_frames=30, motion_estimation_func="optical_flow", threshold=0.5)
class agents.config.VisionConfig#

Bases: agents.config.ModelComponentConfig

Configuration for a detection component.

The config allows you to customize the detection and/or tracking process.

Parameters:
  • threshold (float) – The confidence threshold for object detection, ranging from 0.1 to 1.0 (default: 0.5).

  • get_data_labels (bool) – Whether to return data labels along with detections (default: True).

  • labels_to_track (Optional[list]) – A list of specific labels to track, when the model is used as a tracker (default: None).

Example of usage:

config = DetectionConfig(threshold=0.3)