agents.models#

The following model specification classes are meant to define a comman interface for initialization parameters for ML models across supported model serving platforms.

Module Contents#

Classes#

Encoder

A text encoder model that can be used with vector DBs.

Llama3

A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details

Llama3_1

A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details

OllamaModel

An Ollama model that needs to be initialized with an ollama tag as checkpoint.

Idefics2

A pre-trained visual language model from HuggingFace for tasks such as visual question answering. Details

Llava

LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Details

Whisper

Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details

InstructBlip

An open-source general purpose vision language model by SalesForce built using instruction-tuning. Details

SpeechT5

A model for text-to-speech synthesis developed by Microsoft. Details

Bark

A model for text-to-speech synthesis developed by SunoAI. Details

VisionModel

Object Detection Model with Optional Tracking.

API#

class agents.models.Encoder#

Bases: agents.models.Model

A text encoder model that can be used with vector DBs.

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “BAAI/bge-small-en”.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

class agents.models.Llama3#

Bases: agents.models.TransformersLLM

A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “meta-llama/Meta-Llama-3-8B-Instruct”. For available checkpoints, consult LLama3 checkpoints on HuggingFace.

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llama = Llama3(name='llama', checkpoint="other_checkpoint_name")  # Initialize with a custom checkpoint
class agents.models.Llama3_1#

Bases: agents.models.TransformersLLM

A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “meta-llama/Meta-Llama-3.1-8B-Instruct”. For available checkpoints, consult LLama3 checkpoints on HuggingFace.

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llama = Llama3_1(name='llama', checkpoint="other_checkpoint_name")  # Initialize with a custom checkpoint
class agents.models.OllamaModel#

Bases: agents.models.LLM

An Ollama model that needs to be initialized with an ollama tag as checkpoint.

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. For available checkpoints consult Ollama Models

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llm = OllamaModel(name='ollama1', checkpoint="gemma2:latest")
class agents.models.Idefics2#

Bases: agents.models.TransformersMLLM

A pre-trained visual language model from HuggingFace for tasks such as visual question answering. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “HuggingFaceM4/idefics2-8b”. For available checkpoints, consult Idefics2 checkpoints on HuggingFace.

  • system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

idefics = Idefics2(name='mllm1', quantization="8bit")  # Initialize with a custom checkpoint
class agents.models.Llava#

Bases: agents.models.TransformersMLLM

LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “liuhaotian/llava-v1.6-mistral-7b”. For available checkpoints, consult Llava checkpoints on HuggingFace.

  • system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llava = Llava(name='mllm2', quantization="4bit")
class agents.models.Whisper#

Bases: agents.models.Model

Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “openai/whisper-small.en”. For available checkpoints, consult Whisper checkpoints on HuggingFace.

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

whisper = Whisper(name='s2t', checkpoint="openai/whisper-medium") # Initialize with a different checkpoint
get_init_params() Dict#

Get init params for model initialization.

class agents.models.InstructBlip#

Bases: agents.models.TransformersMLLM

An open-source general purpose vision language model by SalesForce built using instruction-tuning. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “Salesforce/instructblip-vicuna-7b”. For available checkpoints, consult InstructBlip checkpoints on HuggingFace.

  • history_reset_phrase (str) – A phrase used to reset the conversation history. Defaults to “chat reset”.

  • quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

blip = InstructBlip(name='mllm3', quantization="4bit")
class agents.models.SpeechT5#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by Microsoft. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/speecht5_tts”.

  • voice – The voice to use for synthesis. Can be one of “awb”, “bdl”, “clb”, “jmk”, “ksp”, “rms”, or “slt”. Default is “clb”.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

speecht5 = SpeechT5(name='t2s1', voice="bdl")  # Initialize with a different voice
class agents.models.Bark#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by SunoAI. Details

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. Bark checkpoints on HuggingFace. Default is “suno/bark-small”.

  • attn_implementation – The attention implementation to use for the model. Default is “flash_attention_2”.

  • voice – The voice to use for synthesis. More choices are available here. Default is “v2/en_speaker_6”.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

bark = Bark(name='t2s2', voice="v2/en_speaker_1")  # Initialize with a different voice
get_init_params() Dict#

Get init params for model initialization.

class agents.models.VisionModel#

Bases: agents.models.Model

Object Detection Model with Optional Tracking.

This vision model provides a flexible framework for object detection and tracking using the mmdet framework. It can be used as a standalone detector or as a tracker to follow detected objects over time. It can be initizaled with any checkpoint available in the mmdet framework.

Parameters:
  • name (str) – An arbitrary name given to the model.

  • checkpoint (str) – The name of the pre-trained model’s checkpoint. All available checkpoints in the mmdet framework. Default is “dino-4scale_r50_8xb2-12e_coco”.

  • cache_dir (str) – The directory where downloaded models are cached. Default is ‘mmdet’.

  • setup_trackers (bool) – Whether to set up trackers using norfair or not. Default is False.

  • tracking_distance_function (str) – The function used to calculate the distance between detected objects. This can be any distance metric string available in scipy.spatial.distance.cdist Default is “euclidean”.

  • tracking_distance_threshold (int) – The threshold for determining whether two object in consecutive frames are considered close enough to be considered the same object. Default is 30, with a minimum value of 1.

  • deploy_tensorrt (bool) – Deploy the vision model using NVIDIA TensorRT. To utilize this feature with roboml, checkout the instructions here. Default is False.

  • _num_trackers (int) – The number of trackers to use. This number depends on the number of inputs image streams being given to the component. It is set automatically if setup_trackers is True.

  • init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

model = DetectionModel(name='detection1', setup_trackers=True, num_trackers=1, tracking_distance_threshold=20)  # Initialize the model for tracking one object