`agents.models`#

The following model specification classes are meant to define a comman interface for initialization parameters for ML models across supported model serving platforms.

Module Contents#

Classes#

`Encoder`	A text encoder model that can be used with vector DBs.
`Llama3`	A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details
`Llama3_1`	A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details
`OllamaModel`	An Ollama model that needs to be initialized with an ollama tag as checkpoint.
`Idefics2`	A pre-trained visual language model from HuggingFace for tasks such as visual question answering. Details
`Llava`	LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Details
`Whisper`	Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details
`InstructBlip`	An open-source general purpose vision language model by SalesForce built using instruction-tuning. Details
`SpeechT5`	A model for text-to-speech synthesis developed by Microsoft. Details
`Bark`	A model for text-to-speech synthesis developed by SunoAI. Details
`VisionModel`	Object Detection Model with Optional Tracking.

API#

class agents.models.Encoder#

Bases: agents.models.Model

A text encoder model that can be used with vector DBs.

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “BAAI/bge-small-en”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

class agents.models.Llama3#

Bases: agents.models.TransformersLLM

A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “meta-llama/Meta-Llama-3-8B-Instruct”. For available checkpoints, consult LLama3 checkpoints on HuggingFace.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llama = Llama3(name='llama', checkpoint="other_checkpoint_name")  # Initialize with a custom checkpoint

class agents.models.Llama3_1#

Bases: agents.models.TransformersLLM

A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “meta-llama/Meta-Llama-3.1-8B-Instruct”. For available checkpoints, consult LLama3 checkpoints on HuggingFace.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llama = Llama3_1(name='llama', checkpoint="other_checkpoint_name")  # Initialize with a custom checkpoint

class agents.models.OllamaModel#

Bases: agents.models.LLM

An Ollama model that needs to be initialized with an ollama tag as checkpoint.

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. For available checkpoints consult Ollama Models
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llm = OllamaModel(name='ollama1', checkpoint="gemma2:latest")

class agents.models.Idefics2#

Bases: agents.models.TransformersMLLM

A pre-trained visual language model from HuggingFace for tasks such as visual question answering. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “HuggingFaceM4/idefics2-8b”. For available checkpoints, consult Idefics2 checkpoints on HuggingFace.
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

idefics = Idefics2(name='mllm1', quantization="8bit")  # Initialize with a custom checkpoint

class agents.models.Llava#

Bases: agents.models.TransformersMLLM

LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “liuhaotian/llava-v1.6-mistral-7b”. For available checkpoints, consult Llava checkpoints on HuggingFace.
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

llava = Llava(name='mllm2', quantization="4bit")

class agents.models.Whisper#

Bases: agents.models.Model

Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “openai/whisper-small.en”. For available checkpoints, consult Whisper checkpoints on HuggingFace.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

whisper = Whisper(name='s2t', checkpoint="openai/whisper-medium") # Initialize with a different checkpoint

get_init_params() → Dict#: Get init params for model initialization.

class agents.models.InstructBlip#

Bases: agents.models.TransformersMLLM

An open-source general purpose vision language model by SalesForce built using instruction-tuning. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “Salesforce/instructblip-vicuna-7b”. For available checkpoints, consult InstructBlip checkpoints on HuggingFace.
history_reset_phrase (str) – A phrase used to reset the conversation history. Defaults to “chat reset”.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

blip = InstructBlip(name='mllm3', quantization="4bit")

class agents.models.SpeechT5#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by Microsoft. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/speecht5_tts”.
voice – The voice to use for synthesis. Can be one of “awb”, “bdl”, “clb”, “jmk”, “ksp”, “rms”, or “slt”. Default is “clb”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

speecht5 = SpeechT5(name='t2s1', voice="bdl")  # Initialize with a different voice

class agents.models.Bark#

Bases: agents.models.Model

A model for text-to-speech synthesis developed by SunoAI. Details

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Bark checkpoints on HuggingFace. Default is “suno/bark-small”.
attn_implementation – The attention implementation to use for the model. Default is “flash_attention_2”.
voice – The voice to use for synthesis. More choices are available here. Default is “v2/en_speaker_6”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

bark = Bark(name='t2s2', voice="v2/en_speaker_1")  # Initialize with a different voice

get_init_params() → Dict#: Get init params for model initialization.

class agents.models.VisionModel#

Bases: agents.models.Model

Object Detection Model with Optional Tracking.

This vision model provides a flexible framework for object detection and tracking using the mmdet framework. It can be used as a standalone detector or as a tracker to follow detected objects over time. It can be initizaled with any checkpoint available in the mmdet framework.

Parameters:

name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. All available checkpoints in the mmdet framework. Default is “dino-4scale_r50_8xb2-12e_coco”.
cache_dir (str) – The directory where downloaded models are cached. Default is ‘mmdet’.
setup_trackers (bool) – Whether to set up trackers using norfair or not. Default is False.
tracking_distance_function (str) – The function used to calculate the distance between detected objects. This can be any distance metric string available in scipy.spatial.distance.cdist Default is “euclidean”.
tracking_distance_threshold (int) – The threshold for determining whether two object in consecutive frames are considered close enough to be considered the same object. Default is 30, with a minimum value of 1.
deploy_tensorrt (bool) – Deploy the vision model using NVIDIA TensorRT. To utilize this feature with roboml, checkout the instructions here. Default is False.
_num_trackers (int) – The number of trackers to use. This number depends on the number of inputs image streams being given to the component. It is set automatically if setup_trackers is True.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.

Example usage:

model = DetectionModel(name='detection1', setup_trackers=True, num_trackers=1, tracking_distance_threshold=20)  # Initialize the model for tracking one object

agents.models

Contents

agents.models#

Module Contents#

Classes#

API#

`agents.models`#