agents.models#

The following model specification classes are meant to define a comman interface for initialization parameters for ML models across supported model serving platforms.

Module Contents#

Classes#

TransformersLLM

An LLM model that needs to be initialized with any LLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.

TransformersMLLM

An MLLM model that needs to be initialized with any MLLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.

OllamaModel

An Ollama model that needs to be initialized with an ollama tag as checkpoint.

Whisper

Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details

SpeechT5

A model for text-to-speech synthesis developed by Microsoft. Details

Bark

A model for text-to-speech synthesis developed by SunoAI. Details

MeloTTS

A model for text-to-speech synthesis developed by MyShell AI using the MeloTTS engine.

VisionModel

Object Detection Model with Optional Tracking.

API#

class agents.models.TransformersLLM#

Bases: agents.models.LLM

An LLM model that needs to be initialized with any LLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/Phi-3-mini-4k-instruct”. For available checkpoints consult HuggingFace LLM Models

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llm = TransformersLLM(name='llm', checkpoint="meta-llama/Meta-Llama-3.1-8B-Instruct")
get_init_params() Dict#

Get init params from models

class agents.models.TransformersMLLM#

Bases: agents.models.LLM

An MLLM model that needs to be initialized with any MLLM checkpoint available on HuggingFace transformers. This model can be used with a roboml client.

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “HuggingFaceM4/idefics2-8b”. For available checkpoints consult HuggingFace Image-Text to Text Models

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

mllm = TransformersMLLM(name='mllm', checkpoint="gemma2:latest")
get_init_params() Dict#

Get init params from models

class agents.models.OllamaModel#

Bases: agents.models.LLM

An Ollama model that needs to be initialized with an ollama tag as checkpoint.

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. For available checkpoints consult Ollama Models

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

  • options

    Optional dictionary to configure generation behavior. Options that conflict with component config options such as (num_predict and temperature) will be overridden if set in component config. Only the following keys with their specified value types are allowed. For details check Ollama api documentation:

    • num_keep: int

    • seed: int

    • num_predict: int

    • top_k: int

    • top_p: float

    • min_p: float

    • typical_p: float

    • repeat_last_n: int

    • temperature: float

    • repeat_penalty: float

    • presence_penalty: float

    • frequency_penalty: float

    • penalize_newline: bool

    • stop: list of strings

    • numa: bool

    • num_ctx: int

    • num_batch: int

    • num_gpu: int

    • main_gpu: int

    • use_mmap: bool

    • num_thread: int

llm = OllamaModel(
    name='ollama1',
    checkpoint="gemma2:latest",
    options={"temperature": 0.7, "num_predict": 50}
)
get_init_params() Dict#

Get init params from models

class agents.models.Whisper#

Bases: agents.models.Model

Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – Size of the model to use (tiny, tiny.en, base, base.en, small, small.en, distil-small.en, medium, medium.en, distil-medium.en, large-v1, large-v2, large-v3, large, distil-large-v2, distil-large-v3, large-v3-turbo, or turbo). For more information check here

  • compute_type (str or None) – The compute type used by the model. Can be one of “int8”, “fp16”, “fp32”, None (default is “int8”).

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

whisper = Whisper(name='s2t', checkpoint="small") # Initialize with a different checkpoint
get_init_params() Dict#

Get init params from models

class agents.models.SpeechT5#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by Microsoft. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/speecht5_tts”.

  • voice – The voice to use for synthesis. Can be one of “awb”, “bdl”, “clb”, “jmk”, “ksp”, “rms”, or “slt”. Default is “clb”.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

speecht5 = SpeechT5(name='t2s1', voice="bdl")  # Initialize with a different voice
get_init_params() Dict#

Get init params from models

class agents.models.Bark#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by SunoAI. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Bark checkpoints on HuggingFace. Default is “suno/bark-small”.

  • attn_implementation – The attention implementation to use for the model. Default is “flash_attention_2”.

  • voice – The voice to use for synthesis. More choices are available here. Default is “v2/en_speaker_6”.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

bark = Bark(name='t2s2', voice="v2/en_speaker_1")  # Initialize with a different voice
get_init_params() Dict#

Get init params from models

class agents.models.MeloTTS#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by MyShell AI using the MeloTTS engine.

Parameters:
  • name (str) – An arbitrary name given to the model.

  • language (str) – The language for speech synthesis. Supported values: [“EN”, “ES”, “FR”, “ZH”, “JP”, “KR”]. Default is “EN”.

  • speaker_id (str) – The speaker ID for the chosen language. Default is “EN-US”. For details check here

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

melotts = MeloTTS(name='melo1', language='JP', speaker_id='JP-1')
get_init_params() Dict#

Get init params from models

class agents.models.VisionModel#

Bases: agents.models.Model

Object Detection Model with Optional Tracking.

This vision model provides a flexible framework for object detection and tracking using the mmdet framework. It can be used as a standalone detector or as a tracker to follow detected objects over time. It can be initizaled with any checkpoint available in the mmdet framework.

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. All available checkpoints in the mmdet framework. Default is “dino-4scale_r50_8xb2-12e_coco”.

  • cache_dir (str) – The directory where downloaded models are cached. Default is ‘mmdet’.

  • setup_trackers (bool) – Whether to set up trackers using norfair or not. Default is False.

  • tracking_distance_function (str) – The function used to calculate the distance between detected objects. This can be any distance metric string available in scipy.spatial.distance.cdist Default is “euclidean”.

  • tracking_distance_threshold (int) – The threshold for determining whether two object in consecutive frames are considered close enough to be considered the same object. Default is 30, with a minimum value of 1.

  • deploy_tensorrt (bool) – Deploy the vision model using NVIDIA TensorRT. To utilize this feature with roboml, checkout the instructions here. Default is False.

  • _num_trackers (int) – The number of trackers to use. This number depends on the number of inputs image streams being given to the component. It is set automatically if setup_trackers is True.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

model = DetectionModel(name='detection1', setup_trackers=True, num_trackers=1, tracking_distance_threshold=20)  # Initialize the model for tracking one object
get_init_params() Dict#

Get init params from models