agents.config#

Module Contents#

Classes#

LLMConfig

Configuration for the Large Language Model (LLM) component.

MLLMConfig

Configuration for the Multi-Modal Large Language Model (MLLM) component.

SpeechToTextConfig

Configuration for a Speech-To-Text component.

TextToSpeechConfig

Configuration for a Text-To-Speech component.

SemanticRouterConfig

Configuration parameters for a semantic router component.

MapConfig

Configuration for a MapEncoding component.

VideoMessageMakerConfig

Configuration parameters for a video message maker component.

VisionConfig

Configuration for a detection component.

API#

class agents.config.LLMConfig#

Bases: agents.config.ModelComponentConfig

Configuration for the Large Language Model (LLM) component.

It defines various settings that control how the LLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.

Parameters:
  • enable_rag (bool) – Enables or disables Retreival Augmented Generation.

  • collection_name (Optional[str]) – The name of the vectordb collection to use for RAG.

  • distance_func (str) – The distance metric used for nearest neighbor search for RAG. Supported values are β€œl2”, β€œip”, and β€œcosine”.

  • n_results (int) – The maximum number of results to return for RAG. Defaults to 1. For numbers greater than 1, results will be concatenated together in a single string.

  • chat_history (bool) – Whether to include chat history in the LLM’s prompt.

  • history_reset_phrase (str) – Phrase to reset chat history. Defaults to β€˜chat reset’

  • history_size (int) – Number of user messages to keep in chat history. Defaults to 10

  • temperature (float) – Temperature used for sampling tokens during generation. Default is 0.8 and must be greater than 0.0.

  • max_new_tokens (int) – The maximum number of new tokens to generate. Default is 100 and must be greater than 0.

  • stream (bool) – Publish the llm output as a stream of tokens, useful when sending llm output to a user facing client or to a TTS component. Cannot be used in conjunction with tool calling. Default is false

  • break_character (str) – A string character marking that the output thus far received in a stream should be published. This parameter only takes effect when stream is set to True. As stream output is received token by token, it is useful to publish full sentences instead of individual tokens as the components output (for example, for downstream text to speech conversion). This value can be set to an empty string to publish output token by token. Default is β€˜.’ (period)

  • response_terminator (str) – A string token marking that the end of a single response from the model. This token is only used in case of a persistent clients, such as a websocket client and when stream is set to True. It is not published. This value cannot be an empty string. Default is β€˜<>’

Example of usage:

config = LLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")
get_inference_params() Dict#

Get inference params from model components

class agents.config.MLLMConfig#

Bases: agents.config.LLMConfig

Configuration for the Multi-Modal Large Language Model (MLLM) component.

It defines various settings that control how the LLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.

Parameters:
  • enable_rag (bool) – Enables or disables Retreival Augmented Generation.

  • collection_name (Optional[str]) – The name of the vectordb collection to use for RAG.

  • distance_func (str) – The distance metric used for nearest neighbor search for RAG. Supported values are β€œl2”, β€œip”, and β€œcosine”.

  • n_results (int) – The maximum number of results to return for RAG. Defaults to 1. For numbers greater than 1, results will be concatenated together in a single string.

  • chat_history (bool) – Whether to include chat history in the LLM’s prompt.

  • history_reset_phrase (str) – Phrase to reset chat history. Defaults to β€˜chat reset’

  • history_size (int) – Number of user messages to keep in chat history. Defaults to 10

  • temperature (float) – Temperature used for sampling tokens during generation. Default is 0.8 and must be greater than 0.0.

  • max_new_tokens (int) – The maximum number of new tokens to generate. Default is 100 and must be greater than 0.

  • stream (bool) – Publish the llm output as a stream of tokens, useful when sending llm output to a user facing client or to a TTS component. Cannot be used in conjunction with tool calling. Default is false

  • break_character (str) – A string character marking that the output thus far received in a stream should be published. This parameter only takes effect when stream is set to True. As stream output is received token by token, it is useful to publish full sentences instead of individual tokens as the components output (for example, for downstream text to speech conversion). This value can be set to an empty string to publish output token by token. Default is β€˜.’ (period)

  • response_terminator (str) – A string token marking that the end of a single response from the model. This token is only used in case of a persistent clients, such as a websocket client and when stream is set to True. It is not published. This value cannot be an empty string. Default is β€˜<>’

Example of usage:

config = MLLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")
get_inference_params() Dict#

Get inference params from model components

class agents.config.SpeechToTextConfig#

Bases: agents.config.ModelComponentConfig

Configuration for a Speech-To-Text component.

This class defines the configuration options for speech transcription, voice activity detection, wakeword detection, and audio streaming.


Transcription

Parameters:
  • initial_prompt (str or None) – Optional initial prompt to guide transcription (e.g. speaker name or topic). Defaults to None.

  • language (str) – Language code for transcription (e.g. β€œen”, β€œzh”). Must be one of the supported language codes. Defaults to β€œen”.

  • max_new_tokens (int or None) – Maximum number of tokens to generate. If None, no limit is applied. Defaults to None.


Voice Activity Detection (VAD)

Parameters:
  • enable_vad (bool) – Enable VAD to detect when speech is present in audio input. Requires onnxruntime and silero-vad model. Defaults to False.

  • device_audio (Optional[int]) – Audio input device ID. Only used if enable_vad is True. Defaults to None.

  • vad_threshold (float) – Threshold above which speech is considered present. Only used if enable_vad is True. Range: 0.0–1.0. Defaults to 0.5.

  • min_silence_duration_ms (int) – Minimum silence duration (ms) before it’s treated as a pause. Only used if enable_vad is True. Defaults to 300.

  • speech_pad_ms (int) – Silence padding (ms) added to start and end of detected speech regions. Only used if enable_vad is True. Defaults to 30.

  • speech_buffer_max_len (int) – Max length of speech buffer in ms. Only used if enable_vad is True. Defaults to 30000.

  • device_vad (str) – Device for VAD (β€˜cpu’ or β€˜gpu’). Only used if enable_vad is True. Defaults to β€˜cpu’.

  • ncpu_vad (int) – Number of CPU cores to use for VAD (if device_vad is β€˜cpu’). Defaults to 1.


Wakeword Detection

Parameters:
  • enable_wakeword (bool) – Enable detection of a wakeword phrase (e.g. β€˜Hey Jarvis’). Requires enable_vad to be True. Defaults to False.

  • wakeword_threshold (float) – Minimum confidence score to trigger wakeword detection. Only used if enable_wakeword is True. Defaults to 0.6.

  • device_wakeword (str) – Device for Wakeword Detection (β€˜cpu’ or β€˜gpu’). Only used if enable_wakeword is True. Defaults to β€˜cpu’.

  • ncpu_wakeword (int) – Number of CPU cores for Wakeword Detection (if device_wakeword is β€˜cpu’). Defaults to 1.


Streaming

Parameters:
  • stream (bool) – Send audio as a stream to a persistent client (e.g., websockets). Requires enable_vad to be True. Useful for real-time transcription. Defaults to False.

  • min_chunk_size (int) – Audio chunk size in ms to send when streaming. Requires stream to be True. Must be > 100 ms. Defaults to 2000.


Model Paths

Parameters:
  • vad_model_path (str) – Path or URL to VAD ONNX model. Defaults to the Silero VAD model URL.

  • melspectrogram_model_path (str) – Path or URL to melspectrogram model used in wakeword detection. Defaults to openWakeWord model URL.

  • embedding_model_path (str) – Path or URL to audio embedding model for wakeword detection. Defaults to openWakeWord model URL.

  • wakeword_model_path (str) – Path or URL to wakeword ONNX model (e.g. β€˜Hey Jarvis’). Defaults to a pretrained openWakeWord model. For custom models, see: dscripka/openWakeWord


Example

Example usage:

config = SpeechToTextConfig(
    enable_vad=True,
    enable_wakeword=True,
    vad_threshold=0.5,
    wakeword_threshold=0.6,
    min_silence_duration_ms=1000,
    speech_pad_ms=30,
    speech_buffer_max_len=8000,
)
get_inference_params() Dict#

Get inference params from model components

class agents.config.TextToSpeechConfig#

Bases: agents.config.ModelComponentConfig

Configuration for a Text-To-Speech component.

This class defines the configuration options for a Text-To-Speech component.

Parameters:
  • play_on_device (bool) – Whether to play the audio on available audio device (default: False).

  • device – Optional device id (int) for playing the audio. Only effective if play_on_device is True (default: None).

  • buffer_size (int) – Size of the buffer for playing audio on device. Only effective if play_on_device is True (default: 20).

  • block_size (int) – Size of the audio block to be read for playing audio on device. Only effective if play_on_device is True (default: 1024).

  • thread_shutdown_timeout (int) – Timeout to shutdown a playback thread, if data is not received for more than a certain number of seconds. Only effective if play_on_device is True (default: 5 seconds).

  • stream – Stram output when used with WebSocketClient. Useful when model output is large and broken into chunks by the server. (default: True).

Example of usage:

config = TextToSpeechConfig(play_on_device=True, get_bytes=False)
get_inference_params() Dict#

Get inference params from model components

class agents.config.SemanticRouterConfig#

Bases: agents.ros.BaseComponentConfig

Configuration parameters for a semantic router component.

Parameters:
  • router_name (str) – The name of the router.

  • distance_func (str) – The function used to calculate distance from route samples in vectordb. Can be one of β€œl2” (L2 distance), β€œip” (Inner Product), or β€œcosine” (Cosine similarity). Default is β€œl2”.

  • maximum_distance (float) – The maximum distance threshold for routing. A value between 0.1 and 1.0. Defaults to 0.4

Example of usage:

config = SemanticRouterConfig(router_name="my_router")
# or
config = SemanticRouterConfig(router_name="my_router", distance_func="ip", maximum_distance=0.7)
class agents.config.MapConfig#

Bases: agents.ros.BaseComponentConfig

Configuration for a MapEncoding component.

Parameters:
  • map_name (str) – The name of the map.

  • distance_func (str) – The function used to calculate distance when retreiving information from the map collection. Can be one of β€œl2” (L2 distance), β€œip” (Inner Product), or β€œcosine” (Cosine similarity). Default is β€œl2”.

Example of usage:

config = MapConfig(map_name="my_map", distance_func="ip")
class agents.config.VideoMessageMakerConfig#

Bases: agents.ros.BaseComponentConfig

Configuration parameters for a video message maker component.

Parameters:
  • min_video_frames (int) – The minimum number of frames in a video segment. Default is 15, assuming a 0.5 second video at 30 fps.

  • max_video_frames (int) – The maximum number of frames in a video segment. Default is 600, assuming a 20 second video at 30 fps.

  • motion_estimation_func (Optional[str]) – The function used for motion estimation. Can be one of β€œframe_difference” or β€œoptical_flow”. Default is None.

  • threshold (float) – The threshold value for motion detection. A float between 0.1 and 5.0. Default is 0.3.

  • flow_kwargs – Additional keyword arguments for the optical flow algorithm. Default is a dictionary with reasonable values.

Example of usage:

config = VideoMessageMakerConfig()
# or
config = VideoMessageMakerConfig(min_video_frames=30, motion_estimation_func="optical_flow", threshold=0.5)
class agents.config.VisionConfig#

Bases: agents.config.ModelComponentConfig

Configuration for a detection component.

The config allows you to customize the detection and/or tracking process.

Parameters:
  • threshold –

    The confidence threshold for object detection, ranging from 0.1 to 1.0 (default: 0.5).

    type threshold:

    float

    param get_dataset_labels:

    Whether to return data labels along with detections (default: True).

    type get_dataset_labels:

    bool

    param labels_to_track:

    A list of specific labels to track, when the model is used as a tracker (default: None).

    type labels_to_track:

    Optional[list]

  • enable_visualization –

    Whether to enable visualization of detections (default: False). Useful for testing vision component output.

    type enable_visualization:

    Optional[bool]

    param enable_local_classifier:

    Whether to enable a local classifier model for detections (default: False). If a model client is given to the component, than this has no effect.

    type enable_local_classifier:

    bool

    param input_height:

    Height of the input to local classifier model in pixels (default: 640). This parameter is only effective when enable_local_classifier is set to True.

    type input_height:

    int

    param input_width:

    Width of the input to local classifier in pixels (default: 640). This parameter is only effective when enable_local_classifier is set to True.

    type input_width:

    int

    param dataset_labels:

    A dictionary mapping label indices to names, used to interpret model outputs (default: COCO labels). This parameter is only effective when enable_local_classifier is set to True.

    type dataset_labels:

get_inference_params() Dict#

Get inference params from model components