agents.config
#
Module Contents#
Classes#
Configuration for the Large Language Model (LLM) component. |
|
Configuration for the Multi-Modal Large Language Model (MLLM) component. |
|
Configuration for a Speech-To-Text component. |
|
Configuration for a Text-To-Speech component. |
|
Configuration parameters for a semantic router component. |
|
Configuration for a MapEncoding component. |
|
Configuration parameters for a video message maker component. |
|
Configuration for a detection component. |
API#
- class agents.config.LLMConfig#
Bases:
agents.config.ModelComponentConfig
Configuration for the Large Language Model (LLM) component.
It defines various settings that control how the LLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.
- Parameters:
enable_rag (bool) β Enables or disables Retreival Augmented Generation.
collection_name (Optional[str]) β The name of the vectordb collection to use for RAG.
distance_func (str) β The distance metric used for nearest neighbor search for RAG. Supported values are βl2β, βipβ, and βcosineβ.
n_results (int) β The maximum number of results to return for RAG. Defaults to 1. For numbers greater than 1, results will be concatenated together in a single string.
chat_history (bool) β Whether to include chat history in the LLMβs prompt.
history_reset_phrase (str) β Phrase to reset chat history. Defaults to βchat resetβ
history_size (int) β Number of user messages to keep in chat history. Defaults to 10
temperature (float) β Temperature used for sampling tokens during generation. Default is 0.8 and must be greater than 0.0.
max_new_tokens (int) β The maximum number of new tokens to generate. Default is 100 and must be greater than 0.
stream (bool) β Publish the llm output as a stream of tokens, useful when sending llm output to a user facing client or to a TTS component. Cannot be used in conjunction with tool calling. Default is false
break_character (str) β A string character marking that the output thus far received in a stream should be published. This parameter only takes effect when stream is set to True. As stream output is received token by token, it is useful to publish full sentences instead of individual tokens as the components output (for example, for downstream text to speech conversion). This value can be set to an empty string to publish output token by token. Default is β.β (period)
response_terminator (str) β A string token marking that the end of a single response from the model. This token is only used in case of a persistent clients, such as a websocket client and when stream is set to True. It is not published. This value cannot be an empty string. Default is β<
>β
Example of usage:
config = LLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")
- get_inference_params() Dict #
Get inference params from model components
- class agents.config.MLLMConfig#
Bases:
agents.config.LLMConfig
Configuration for the Multi-Modal Large Language Model (MLLM) component.
It defines various settings that control how the LLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.
- Parameters:
enable_rag (bool) β Enables or disables Retreival Augmented Generation.
collection_name (Optional[str]) β The name of the vectordb collection to use for RAG.
distance_func (str) β The distance metric used for nearest neighbor search for RAG. Supported values are βl2β, βipβ, and βcosineβ.
n_results (int) β The maximum number of results to return for RAG. Defaults to 1. For numbers greater than 1, results will be concatenated together in a single string.
chat_history (bool) β Whether to include chat history in the LLMβs prompt.
history_reset_phrase (str) β Phrase to reset chat history. Defaults to βchat resetβ
history_size (int) β Number of user messages to keep in chat history. Defaults to 10
temperature (float) β Temperature used for sampling tokens during generation. Default is 0.8 and must be greater than 0.0.
max_new_tokens (int) β The maximum number of new tokens to generate. Default is 100 and must be greater than 0.
stream (bool) β Publish the llm output as a stream of tokens, useful when sending llm output to a user facing client or to a TTS component. Cannot be used in conjunction with tool calling. Default is false
break_character (str) β A string character marking that the output thus far received in a stream should be published. This parameter only takes effect when stream is set to True. As stream output is received token by token, it is useful to publish full sentences instead of individual tokens as the components output (for example, for downstream text to speech conversion). This value can be set to an empty string to publish output token by token. Default is β.β (period)
response_terminator (str) β A string token marking that the end of a single response from the model. This token is only used in case of a persistent clients, such as a websocket client and when stream is set to True. It is not published. This value cannot be an empty string. Default is β<
>β
Example of usage:
config = MLLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")
- get_inference_params() Dict #
Get inference params from model components
- class agents.config.SpeechToTextConfig#
Bases:
agents.config.ModelComponentConfig
Configuration for a Speech-To-Text component.
This class defines the configuration options for speech transcription, voice activity detection, wakeword detection, and audio streaming.
Transcription
- Parameters:
initial_prompt (str or None) β Optional initial prompt to guide transcription (e.g. speaker name or topic). Defaults to None.
language (str) β Language code for transcription (e.g. βenβ, βzhβ). Must be one of the supported language codes. Defaults to βenβ.
max_new_tokens (int or None) β Maximum number of tokens to generate. If None, no limit is applied. Defaults to None.
Voice Activity Detection (VAD)
- Parameters:
enable_vad (bool) β Enable VAD to detect when speech is present in audio input. Requires onnxruntime and silero-vad model. Defaults to False.
device_audio (Optional[int]) β Audio input device ID. Only used if
enable_vad
is True. Defaults to None.vad_threshold (float) β Threshold above which speech is considered present. Only used if
enable_vad
is True. Range: 0.0β1.0. Defaults to 0.5.min_silence_duration_ms (int) β Minimum silence duration (ms) before itβs treated as a pause. Only used if
enable_vad
is True. Defaults to 300.speech_pad_ms (int) β Silence padding (ms) added to start and end of detected speech regions. Only used if
enable_vad
is True. Defaults to 30.speech_buffer_max_len (int) β Max length of speech buffer in ms. Only used if
enable_vad
is True. Defaults to 30000.device_vad (str) β Device for VAD (βcpuβ or βgpuβ). Only used if
enable_vad
is True. Defaults to βcpuβ.ncpu_vad (int) β Number of CPU cores to use for VAD (if
device_vad
is βcpuβ). Defaults to 1.
Wakeword Detection
- Parameters:
enable_wakeword (bool) β Enable detection of a wakeword phrase (e.g. βHey Jarvisβ). Requires
enable_vad
to be True. Defaults to False.wakeword_threshold (float) β Minimum confidence score to trigger wakeword detection. Only used if
enable_wakeword
is True. Defaults to 0.6.device_wakeword (str) β Device for Wakeword Detection (βcpuβ or βgpuβ). Only used if
enable_wakeword
is True. Defaults to βcpuβ.ncpu_wakeword (int) β Number of CPU cores for Wakeword Detection (if
device_wakeword
is βcpuβ). Defaults to 1.
Streaming
- Parameters:
stream (bool) β Send audio as a stream to a persistent client (e.g., websockets). Requires
enable_vad
to be True. Useful for real-time transcription. Defaults to False.min_chunk_size (int) β Audio chunk size in ms to send when streaming. Requires
stream
to be True. Must be > 100 ms. Defaults to 2000.
Model Paths
- Parameters:
vad_model_path (str) β Path or URL to VAD ONNX model. Defaults to the Silero VAD model URL.
melspectrogram_model_path (str) β Path or URL to melspectrogram model used in wakeword detection. Defaults to openWakeWord model URL.
embedding_model_path (str) β Path or URL to audio embedding model for wakeword detection. Defaults to openWakeWord model URL.
wakeword_model_path (str) β Path or URL to wakeword ONNX model (e.g. βHey Jarvisβ). Defaults to a pretrained openWakeWord model. For custom models, see: dscripka/openWakeWord
Example
Example usage:
config = SpeechToTextConfig( enable_vad=True, enable_wakeword=True, vad_threshold=0.5, wakeword_threshold=0.6, min_silence_duration_ms=1000, speech_pad_ms=30, speech_buffer_max_len=8000, )
- get_inference_params() Dict #
Get inference params from model components
- class agents.config.TextToSpeechConfig#
Bases:
agents.config.ModelComponentConfig
Configuration for a Text-To-Speech component.
This class defines the configuration options for a Text-To-Speech component.
- Parameters:
play_on_device (bool) β Whether to play the audio on available audio device (default: False).
device β Optional device id (int) for playing the audio. Only effective if play_on_device is True (default: None).
buffer_size (int) β Size of the buffer for playing audio on device. Only effective if play_on_device is True (default: 20).
block_size (int) β Size of the audio block to be read for playing audio on device. Only effective if play_on_device is True (default: 1024).
thread_shutdown_timeout (int) β Timeout to shutdown a playback thread, if data is not received for more than a certain number of seconds. Only effective if play_on_device is True (default: 5 seconds).
stream β Stram output when used with WebSocketClient. Useful when model output is large and broken into chunks by the server. (default: True).
Example of usage:
config = TextToSpeechConfig(play_on_device=True, get_bytes=False)
- get_inference_params() Dict #
Get inference params from model components
- class agents.config.SemanticRouterConfig#
Bases:
agents.ros.BaseComponentConfig
Configuration parameters for a semantic router component.
- Parameters:
router_name (str) β The name of the router.
distance_func (str) β The function used to calculate distance from route samples in vectordb. Can be one of βl2β (L2 distance), βipβ (Inner Product), or βcosineβ (Cosine similarity). Default is βl2β.
maximum_distance (float) β The maximum distance threshold for routing. A value between 0.1 and 1.0. Defaults to 0.4
Example of usage:
config = SemanticRouterConfig(router_name="my_router") # or config = SemanticRouterConfig(router_name="my_router", distance_func="ip", maximum_distance=0.7)
- class agents.config.MapConfig#
Bases:
agents.ros.BaseComponentConfig
Configuration for a MapEncoding component.
- Parameters:
map_name (str) β The name of the map.
distance_func (str) β The function used to calculate distance when retreiving information from the map collection. Can be one of βl2β (L2 distance), βipβ (Inner Product), or βcosineβ (Cosine similarity). Default is βl2β.
Example of usage:
config = MapConfig(map_name="my_map", distance_func="ip")
- class agents.config.VideoMessageMakerConfig#
Bases:
agents.ros.BaseComponentConfig
Configuration parameters for a video message maker component.
- Parameters:
min_video_frames (int) β The minimum number of frames in a video segment. Default is 15, assuming a 0.5 second video at 30 fps.
max_video_frames (int) β The maximum number of frames in a video segment. Default is 600, assuming a 20 second video at 30 fps.
motion_estimation_func (Optional[str]) β The function used for motion estimation. Can be one of βframe_differenceβ or βoptical_flowβ. Default is None.
threshold (float) β The threshold value for motion detection. A float between 0.1 and 5.0. Default is 0.3.
flow_kwargs β Additional keyword arguments for the optical flow algorithm. Default is a dictionary with reasonable values.
Example of usage:
config = VideoMessageMakerConfig() # or config = VideoMessageMakerConfig(min_video_frames=30, motion_estimation_func="optical_flow", threshold=0.5)
- class agents.config.VisionConfig#
Bases:
agents.config.ModelComponentConfig
Configuration for a detection component.
The config allows you to customize the detection and/or tracking process.
- Parameters:
threshold β
The confidence threshold for object detection, ranging from 0.1 to 1.0 (default: 0.5).
- type threshold:
float
- param get_dataset_labels:
Whether to return data labels along with detections (default: True).
- type get_dataset_labels:
bool
- param labels_to_track:
A list of specific labels to track, when the model is used as a tracker (default: None).
- type labels_to_track:
Optional[list]
enable_visualization β
Whether to enable visualization of detections (default: False). Useful for testing vision component output.
- type enable_visualization:
Optional[bool]
- param enable_local_classifier:
Whether to enable a local classifier model for detections (default: False). If a model client is given to the component, than this has no effect.
- type enable_local_classifier:
bool
- param input_height:
Height of the input to local classifier model in pixels (default: 640). This parameter is only effective when enable_local_classifier is set to True.
- type input_height:
int
- param input_width:
Width of the input to local classifier in pixels (default: 640). This parameter is only effective when enable_local_classifier is set to True.
- type input_width:
int
- param dataset_labels:
A dictionary mapping label indices to names, used to interpret model outputs (default: COCO labels). This parameter is only effective when enable_local_classifier is set to True.
- type dataset_labels:
- get_inference_params() Dict #
Get inference params from model components