`agents.models`#

The following model specification classes are meant to define a comman interface for initialization parameters for ML models across supported model serving platforms.

Module Contents#

Classes#

`TransformersLLM`	An LLM model that needs to be initialized with any LLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.
`TransformersMLLM`	An MLLM model that needs to be initialized with any MLLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.
`OllamaModel`	An Ollama model that needs to be initialized with an ollama tag as checkpoint.
`Whisper`	Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details
`SpeechT5`	A model for text-to-speech synthesis developed by Microsoft. Details
`Bark`	A model for text-to-speech synthesis developed by SunoAI. Details
`MeloTTS`	A model for text-to-speech synthesis developed by MyShell AI using the MeloTTS engine.
`VisionModel`	Object Detection Model with Optional Tracking.

API#

class agents.models.TransformersLLM#

Bases: agents.models.LLM

An LLM model that needs to be initialized with any LLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/Phi-3-mini-4k-instruct”. For available checkpoints consult HuggingFace LLM Models
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llm = TransformersLLM(name='llm', checkpoint="meta-llama/Meta-Llama-3.1-8B-Instruct")

get_init_params() → Dict#: Get init params from models

class agents.models.TransformersMLLM#

Bases: agents.models.LLM

An MLLM model that needs to be initialized with any MLLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “HuggingFaceM4/idefics2-8b”. For available checkpoints consult HuggingFace Image-Text to Text Models
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

mllm = TransformersMLLM(name='mllm', checkpoint="gemma2:latest")

get_init_params() → Dict#: Get init params from models

class agents.models.OllamaModel#

Bases: agents.models.LLM

An Ollama model that needs to be initialized with an ollama tag as checkpoint.

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. For available checkpoints consult Ollama Models
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
options –
Optional dictionary to configure generation behavior. Options that conflict with component config options such as (num_predict and temperature) will be overridden if set in component config. Only the following keys with their specified value types are allowed. For details check Ollama api documentation:
- num_keep: int
- seed: int
- num_predict: int
- top_k: int
- top_p: float
- min_p: float
- typical_p: float
- repeat_last_n: int
- temperature: float
- repeat_penalty: float
- presence_penalty: float
- frequency_penalty: float
- penalize_newline: bool
- stop: list of strings
- numa: bool
- num_ctx: int
- num_batch: int
- num_gpu: int
- main_gpu: int
- use_mmap: bool
- num_thread: int

llm = OllamaModel(
    name='ollama1',
    checkpoint="gemma2:latest",
    options={"temperature": 0.7, "num_predict": 50}
)

get_init_params() → Dict#: Get init params from models

class agents.models.Whisper#

Bases: agents.models.Model

Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – Size of the model to use (tiny, tiny.en, base, base.en, small, small.en, distil-small.en, medium, medium.en, distil-medium.en, large-v1, large-v2, large-v3, large, distil-large-v2, distil-large-v3, large-v3-turbo, or turbo). For more information check here
compute_type (str or None) – The compute type used by the model. Can be one of “int8”, “fp16”, “fp32”, None (default is “int8”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

whisper = Whisper(name='s2t', checkpoint="small") # Initialize with a different checkpoint

get_init_params() → Dict#: Get init params from models

class agents.models.SpeechT5#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by Microsoft. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/speecht5_tts”.
voice – The voice to use for synthesis. Can be one of “awb”, “bdl”, “clb”, “jmk”, “ksp”, “rms”, or “slt”. Default is “clb”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

speecht5 = SpeechT5(name='t2s1', voice="bdl")  # Initialize with a different voice

get_init_params() → Dict#: Get init params from models

class agents.models.Bark#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by SunoAI. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Bark checkpoints on HuggingFace. Default is “suno/bark-small”.
attn_implementation – The attention implementation to use for the model. Default is “flash_attention_2”.
voice – The voice to use for synthesis. More choices are available here. Default is “v2/en_speaker_6”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

bark = Bark(name='t2s2', voice="v2/en_speaker_1")  # Initialize with a different voice

get_init_params() → Dict#: Get init params from models

class agents.models.MeloTTS#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by MyShell AI using the MeloTTS engine.

Parameters:

name (str) – An arbitrary name given to the model.
language (str) – The language for speech synthesis. Supported values: [“EN”, “ES”, “FR”, “ZH”, “JP”, “KR”]. Default is “EN”.
speaker_id (str) – The speaker ID for the chosen language. Default is “EN-US”. For details check here
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

melotts = MeloTTS(name='melo1', language='JP', speaker_id='JP-1')

get_init_params() → Dict#: Get init params from models

class agents.models.VisionModel#

Bases: agents.models.Model

Object Detection Model with Optional Tracking.

This vision model provides a flexible framework for object detection and tracking using the mmdet framework. It can be used as a standalone detector or as a tracker to follow detected objects over time. It can be initizaled with any checkpoint available in the mmdet framework.

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. All available checkpoints in the mmdet framework. Default is “dino-4scale_r50_8xb2-12e_coco”.
cache_dir (str) – The directory where downloaded models are cached. Default is ‘mmdet’.
setup_trackers (bool) – Whether to set up trackers using norfair or not. Default is False.
tracking_distance_function (str) – The function used to calculate the distance between detected objects. This can be any distance metric string available in scipy.spatial.distance.cdist Default is “euclidean”.
tracking_distance_threshold (int) – The threshold for determining whether two object in consecutive frames are considered close enough to be considered the same object. Default is 30, with a minimum value of 1.
deploy_tensorrt (bool) – Deploy the vision model using NVIDIA TensorRT. To utilize this feature with roboml, checkout the instructions here. Default is False.
_num_trackers (int) – The number of trackers to use. This number depends on the number of inputs image streams being given to the component. It is set automatically if setup_trackers is True.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

model = DetectionModel(name='detection1', setup_trackers=True, num_trackers=1, tracking_distance_threshold=20)  # Initialize the model for tracking one object

get_init_params() → Dict#: Get init params from models

agents.models

Contents

agents.models#

Module Contents#

Classes#

API#

`agents.models`#