agents.models
#
The following model specification classes are meant to define a comman interface for initialization parameters for ML models across supported model serving platforms.
Module Contents#
Classes#
An LLM model that needs to be initialized with any LLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client. |
|
An MLLM model that needs to be initialized with any MLLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client. |
|
An Ollama model that needs to be initialized with an ollama tag as checkpoint. |
|
Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details |
|
A model for text-to-speech synthesis developed by Microsoft. Details |
|
A model for text-to-speech synthesis developed by SunoAI. Details |
|
A model for text-to-speech synthesis developed by MyShell AI using the MeloTTS engine. |
|
Object Detection Model with Optional Tracking. |
API#
- class agents.models.TransformersLLM#
Bases:
agents.models.LLM
An LLM model that needs to be initialized with any LLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/Phi-3-mini-4k-instruct”. For available checkpoints consult HuggingFace LLM Models
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
llm = TransformersLLM(name='llm', checkpoint="meta-llama/Meta-Llama-3.1-8B-Instruct")
- get_init_params() Dict #
Get init params from models
- class agents.models.TransformersMLLM#
Bases:
agents.models.LLM
An MLLM model that needs to be initialized with any MLLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “HuggingFaceM4/idefics2-8b”. For available checkpoints consult HuggingFace Image-Text to Text Models
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
mllm = TransformersMLLM(name='mllm', checkpoint="gemma2:latest")
- get_init_params() Dict #
Get init params from models
- class agents.models.OllamaModel#
Bases:
agents.models.LLM
An Ollama model that needs to be initialized with an ollama tag as checkpoint.
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. For available checkpoints consult Ollama Models
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
options –
Optional dictionary to configure generation behavior. Options that conflict with component config options such as (num_predict and temperature) will be overridden if set in component config. Only the following keys with their specified value types are allowed. For details check Ollama api documentation:
num_keep: int
seed: int
num_predict: int
top_k: int
top_p: float
min_p: float
typical_p: float
repeat_last_n: int
temperature: float
repeat_penalty: float
presence_penalty: float
frequency_penalty: float
penalize_newline: bool
stop: list of strings
numa: bool
num_ctx: int
num_batch: int
num_gpu: int
main_gpu: int
use_mmap: bool
num_thread: int
llm = OllamaModel( name='ollama1', checkpoint="gemma2:latest", options={"temperature": 0.7, "num_predict": 50} )
- get_init_params() Dict #
Get init params from models
- class agents.models.Whisper#
Bases:
agents.models.Model
Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – Size of the model to use (tiny, tiny.en, base, base.en, small, small.en, distil-small.en, medium, medium.en, distil-medium.en, large-v1, large-v2, large-v3, large, distil-large-v2, distil-large-v3, large-v3-turbo, or turbo). For more information check here
compute_type (str or None) – The compute type used by the model. Can be one of “int8”, “fp16”, “fp32”, None (default is “int8”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
whisper = Whisper(name='s2t', checkpoint="small") # Initialize with a different checkpoint
- get_init_params() Dict #
Get init params from models
- class agents.models.SpeechT5#
Bases:
agents.models.Model
A model for text-to-speech synthesis developed by Microsoft. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/speecht5_tts”.
voice – The voice to use for synthesis. Can be one of “awb”, “bdl”, “clb”, “jmk”, “ksp”, “rms”, or “slt”. Default is “clb”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
speecht5 = SpeechT5(name='t2s1', voice="bdl") # Initialize with a different voice
- get_init_params() Dict #
Get init params from models
- class agents.models.Bark#
Bases:
agents.models.Model
A model for text-to-speech synthesis developed by SunoAI. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Bark checkpoints on HuggingFace. Default is “suno/bark-small”.
attn_implementation – The attention implementation to use for the model. Default is “flash_attention_2”.
voice – The voice to use for synthesis. More choices are available here. Default is “v2/en_speaker_6”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
bark = Bark(name='t2s2', voice="v2/en_speaker_1") # Initialize with a different voice
- get_init_params() Dict #
Get init params from models
- class agents.models.MeloTTS#
Bases:
agents.models.Model
A model for text-to-speech synthesis developed by MyShell AI using the MeloTTS engine.
- Parameters:
name (str) – An arbitrary name given to the model.
language (str) – The language for speech synthesis. Supported values: [“EN”, “ES”, “FR”, “ZH”, “JP”, “KR”]. Default is “EN”.
speaker_id (str) – The speaker ID for the chosen language. Default is “EN-US”. For details check here
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
melotts = MeloTTS(name='melo1', language='JP', speaker_id='JP-1')
- get_init_params() Dict #
Get init params from models
- class agents.models.VisionModel#
Bases:
agents.models.Model
Object Detection Model with Optional Tracking.
This vision model provides a flexible framework for object detection and tracking using the mmdet framework. It can be used as a standalone detector or as a tracker to follow detected objects over time. It can be initizaled with any checkpoint available in the mmdet framework.
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. All available checkpoints in the mmdet framework. Default is “dino-4scale_r50_8xb2-12e_coco”.
cache_dir (str) – The directory where downloaded models are cached. Default is ‘mmdet’.
setup_trackers (bool) – Whether to set up trackers using norfair or not. Default is False.
tracking_distance_function (str) – The function used to calculate the distance between detected objects. This can be any distance metric string available in scipy.spatial.distance.cdist Default is “euclidean”.
tracking_distance_threshold (int) – The threshold for determining whether two object in consecutive frames are considered close enough to be considered the same object. Default is 30, with a minimum value of 1.
deploy_tensorrt (bool) – Deploy the vision model using NVIDIA TensorRT. To utilize this feature with roboml, checkout the instructions here. Default is False.
_num_trackers (int) – The number of trackers to use. This number depends on the number of inputs image streams being given to the component. It is set automatically if setup_trackers is True.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
model = DetectionModel(name='detection1', setup_trackers=True, num_trackers=1, tracking_distance_threshold=20) # Initialize the model for tracking one object
- get_init_params() Dict #
Get init params from models