agents.config#

Module Contents#

Classes#

LLMConfig

Configuration for the Large Language Model (LLM) component.

MLLMConfig

Configuration for the Multi-Modal Large Language Model (MLLM) component.

VisionConfig

Configuration for a detection component.

TextToSpeechConfig

Configuration for a Text-To-Speech component.

SpeechToTextConfig

Configuration for a Speech-To-Text component.

MapConfig

Configuration for a MapEncoding component.

SemanticRouterConfig

Configuration parameters for a semantic router component.

VideoMessageMakerConfig

Configuration parameters for a video message maker component.

API#

class agents.config.LLMConfig#

Bases: agents.ros.BaseComponentConfig

Configuration for the Large Language Model (LLM) component.

It defines various settings that control how the LLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.

Parameters:
  • enable_rag (bool) – Enables or disables Retreival Augmented Generation.

  • collection_name (Optional[str]) – The name of the vectordb collection to use for RAG.

  • distance_func (str) – The distance metric used for nearest neighbor search for RAG. Supported values are “l2”, “ip”, and “cosine”.

  • n_results (int) – The maximum number of results to return for RAG. Defaults to 1. For numbers greater than 1, results will be concatenated together in a single string.

  • chat_history (bool) – Whether to include chat history in the LLM’s prompt.

  • history_reset_phrase (str) – Phrase to reset chat history. Defaults to ‘chat reset’

  • history_size (int) – Number of user messages to keep in chat history. Defaults to 10

  • temperature (float) – Temperature used for sampling tokens during generation. Default is 0.8 and must be greater than 0.0.

  • max_new_tokens – The maximum number of new tokens to generate. Default is 100 and must be greater than 0.

Example of usage:

config = LLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")
class agents.config.MLLMConfig#

Bases: agents.config.LLMConfig

Configuration for the Multi-Modal Large Language Model (MLLM) component.

It defines various settings that control how the LLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.

Parameters:
  • enable_rag (bool) – Enables or disables Retreival Augmented Generation.

  • collection_name (Optional[str]) – The name of the vectordb collection to use for RAG.

  • distance_func (str) – The distance metric used for nearest neighbor search for RAG. Supported values are “l2”, “ip”, and “cosine”.

  • n_results (int) – The maximum number of results to return for RAG.

  • chat_history (bool) – Whether to include chat history in the MLLM’s prompt.

  • temperature (float) – Temperature used for sampling tokens during generation. Default is 0.7 and must be greater than 0.0.

  • max_new_tokens – The maximum number of new tokens to generate. Default is 100 and must be greater than 0.

Example of usage:

config = MLLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")
class agents.config.VisionConfig#

Bases: agents.ros.BaseComponentConfig

Configuration for a detection component.

The config allows you to customize the detection and/or tracking process.

Parameters:
  • threshold (float) – The confidence threshold for object detection, ranging from 0.1 to 1.0 (default: 0.5).

  • get_data_labels (bool) – Whether to return data labels along with detections (default: True).

  • labels_to_track (Optional[list]) – A list of specific labels to track, when the model is used as a tracker (default: None).

Example of usage:

config = DetectionConfig(threshold=0.3)
class agents.config.TextToSpeechConfig#

Bases: agents.ros.BaseComponentConfig

Configuration for a Text-To-Speech component.

This class defines the configuration options for a Text-To-Speech component.

Parameters:
  • play_on_device (bool) – Whether to play the audio on available audio device (default: False).

  • device – Device id (int) or name (sub-string) for playing the audio. Only effective if play_on_device is True (default: ‘default’).

  • buffer_size (int) – Size of the buffer for playing audio on device. Only effective if play_on_device is True (default: 20).

  • block_size (int) – Size of the audio block to be read for playing audio on device. Only effective if play_on_device is True (default: 1024).

  • get_bytes (bool) – Whether the model should return the speech data as bytes instead of base64 encoded string(default: False).

Example of usage:

config = TextToSpeechConfig(play_on_device=True, get_bytes=False)
class agents.config.SpeechToTextConfig#

Bases: agents.ros.BaseComponentConfig

Configuration for a Speech-To-Text component.

This class defines the configuration options for a Speech-To-Text component.

Parameters:
  • enable_vad (bool) – Enable Voice Activity Detection (VAD) to identify when speech is present in continuous input stream from an input audio device. Uses silero-vad model and requires, PyTorch to be installed. Defaults to False.

  • device (Union[int, str]) – Device id (int) or name (sub-string) to use for audio input. Only effective if enable_vad is set to true. Defaults to ‘default’.

  • sample_rate (int) – Sample rate of the audio stream in Hz. Must be 8000 or 16000. Only effective if enable_vad is set to true. Default is 16000.

  • threshold (float) – Minimum threshold above which speech is considered present. Only effective if enable_vad is set to true. Defaults to 0.5 (50%).

  • min_silence_duration_ms (int) – Minimum duration of silence in milliseconds before considering it as a speaker pause. Only effective if enable_vad is set to true. Defaults to 500 ms.

  • speech_pad_ms – Duration in milliseconds to pad silence at the start and end of detected speech regions. Only effective if enable_vad is set to true. Defaults to 30 ms.

Example of usage:

config = SpeechToTextConfig(
    enable_vad=True,
    device="my_device",
    sample_rate=16000,
    threshold=0.5,
    min_silence_duration_ms=500,
    speech_pad_ms=30,
)
class agents.config.MapConfig#

Bases: agents.ros.BaseComponentConfig

Configuration for a MapEncoding component.

Parameters:
  • map_name (str) – The name of the map.

  • distance_func (str) – The function used to calculate distance when retreiving information from the map collection. Can be one of “l2” (L2 distance), “ip” (Inner Product), or “cosine” (Cosine similarity). Default is “l2”.

Example of usage:

config = MapConfig(map_name="my_map", distance_func="ip")
class agents.config.SemanticRouterConfig#

Bases: agents.ros.BaseComponentConfig

Configuration parameters for a semantic router component.

Parameters:
  • router_name (str) – The name of the router.

  • distance_func (str) – The function used to calculate distance from route samples in vectordb. Can be one of “l2” (L2 distance), “ip” (Inner Product), or “cosine” (Cosine similarity). Default is “l2”.

  • maximum_distance (float) – The maximum distance threshold for routing. A value between 0.1 and 1.0. Defaults to 0.4

Example of usage:

config = SemanticRouterConfig(router_name="my_router")
# or
config = SemanticRouterConfig(router_name="my_router", distance_func="ip", maximum_distance=0.7)
class agents.config.VideoMessageMakerConfig#

Bases: agents.ros.BaseComponentConfig

Configuration parameters for a video message maker component.

Parameters:
  • min_video_frames (int) – The minimum number of frames in a video segment. Default is 15, assuming a 0.5 second video at 30 fps.

  • max_video_frames (int) – The maximum number of frames in a video segment. Default is 600, assuming a 20 second video at 30 fps.

  • motion_estimation_func (Optional[str]) – The function used for motion estimation. Can be one of “frame_difference” or “optical_flow”. Default is None.

  • threshold (float) – The threshold value for motion detection. A float between 0.1 and 5.0. Default is 0.3.

  • flow_kwargs – Additional keyword arguments for the optical flow algorithm. Default is a dictionary with reasonable values.

Example of usage:

config = VideoMessageMakerConfig()
# or
config = VideoMessageMakerConfig(min_video_frames=30, motion_estimation_func="optical_flow", threshold=0.5)