agents.models
#
The following model specification classes are meant to define a comman interface for initialization parameters for ML models across supported model serving platforms.
Module Contents#
Classes#
A text encoder model that can be used with vector DBs. |
|
A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details |
|
A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details |
|
An Ollama model that needs to be initialized with an ollama tag as checkpoint. |
|
A pre-trained visual language model from HuggingFace for tasks such as visual question answering. Details |
|
LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Details |
|
Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details |
|
An open-source general purpose vision language model by SalesForce built using instruction-tuning. Details |
|
A model for text-to-speech synthesis developed by Microsoft. Details |
|
A model for text-to-speech synthesis developed by SunoAI. Details |
|
Object Detection Model with Optional Tracking. |
API#
- class agents.models.Encoder#
Bases:
agents.models.Model
A text encoder model that can be used with vector DBs.
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “BAAI/bge-small-en”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
- class agents.models.Llama3#
Bases:
agents.models.TransformersLLM
A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “meta-llama/Meta-Llama-3-8B-Instruct”. For available checkpoints, consult LLama3 checkpoints on HuggingFace.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
llama = Llama3(name='llama', checkpoint="other_checkpoint_name") # Initialize with a custom checkpoint
- class agents.models.Llama3_1#
Bases:
agents.models.TransformersLLM
A pre-trained language model from MetaAI for tasks such as text generation, question answering, and more. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “meta-llama/Meta-Llama-3.1-8B-Instruct”. For available checkpoints, consult LLama3 checkpoints on HuggingFace.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
llama = Llama3_1(name='llama', checkpoint="other_checkpoint_name") # Initialize with a custom checkpoint
- class agents.models.OllamaModel#
Bases:
agents.models.LLM
An Ollama model that needs to be initialized with an ollama tag as checkpoint.
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. For available checkpoints consult Ollama Models
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
llm = OllamaModel(name='ollama1', checkpoint="gemma2:latest")
- class agents.models.Idefics2#
Bases:
agents.models.TransformersMLLM
A pre-trained visual language model from HuggingFace for tasks such as visual question answering. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “HuggingFaceM4/idefics2-8b”. For available checkpoints, consult Idefics2 checkpoints on HuggingFace.
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
idefics = Idefics2(name='mllm1', quantization="8bit") # Initialize with a custom checkpoint
- class agents.models.Llava#
Bases:
agents.models.TransformersMLLM
LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “liuhaotian/llava-v1.6-mistral-7b”. For available checkpoints, consult Llava checkpoints on HuggingFace.
system_prompt (str or None) – The system prompt used to initialize the model. If not provided, defaults to None.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
llava = Llava(name='mllm2', quantization="4bit")
- class agents.models.Whisper#
Bases:
agents.models.Model
Whisper is an automatic speech recognition (ASR) system by OpenAI trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “openai/whisper-small.en”. For available checkpoints, consult Whisper checkpoints on HuggingFace.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
whisper = Whisper(name='s2t', checkpoint="openai/whisper-medium") # Initialize with a different checkpoint
- get_init_params() Dict #
Get init params for model initialization.
- class agents.models.InstructBlip#
Bases:
agents.models.TransformersMLLM
An open-source general purpose vision language model by SalesForce built using instruction-tuning. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “Salesforce/instructblip-vicuna-7b”. For available checkpoints, consult InstructBlip checkpoints on HuggingFace.
history_reset_phrase (str) – A phrase used to reset the conversation history. Defaults to “chat reset”.
quantization (str or None) – The quantization scheme used by the model. Can be one of “4bit”, “8bit” or None (default is “4bit”).
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
blip = InstructBlip(name='mllm3', quantization="4bit")
- class agents.models.SpeechT5#
Bases:
agents.models.Model
A model for text-to-speech synthesis developed by Microsoft. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Default is “microsoft/speecht5_tts”.
voice – The voice to use for synthesis. Can be one of “awb”, “bdl”, “clb”, “jmk”, “ksp”, “rms”, or “slt”. Default is “clb”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
speecht5 = SpeechT5(name='t2s1', voice="bdl") # Initialize with a different voice
- class agents.models.Bark#
Bases:
agents.models.Model
A model for text-to-speech synthesis developed by SunoAI. Details
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. Bark checkpoints on HuggingFace. Default is “suno/bark-small”.
attn_implementation – The attention implementation to use for the model. Default is “flash_attention_2”.
voice – The voice to use for synthesis. More choices are available here. Default is “v2/en_speaker_6”.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
bark = Bark(name='t2s2', voice="v2/en_speaker_1") # Initialize with a different voice
- get_init_params() Dict #
Get init params for model initialization.
- class agents.models.VisionModel#
Bases:
agents.models.Model
Object Detection Model with Optional Tracking.
This vision model provides a flexible framework for object detection and tracking using the mmdet framework. It can be used as a standalone detector or as a tracker to follow detected objects over time. It can be initizaled with any checkpoint available in the mmdet framework.
- Parameters:
name (str) – An arbitrary name given to the model.
checkpoint (str) – The name of the pre-trained model’s checkpoint. All available checkpoints in the mmdet framework. Default is “dino-4scale_r50_8xb2-12e_coco”.
cache_dir (str) – The directory where downloaded models are cached. Default is ‘mmdet’.
setup_trackers (bool) – Whether to set up trackers using norfair or not. Default is False.
tracking_distance_function (str) – The function used to calculate the distance between detected objects. This can be any distance metric string available in scipy.spatial.distance.cdist Default is “euclidean”.
tracking_distance_threshold (int) – The threshold for determining whether two object in consecutive frames are considered close enough to be considered the same object. Default is 30, with a minimum value of 1.
deploy_tensorrt (bool) – Deploy the vision model using NVIDIA TensorRT. To utilize this feature with roboml, checkout the instructions here. Default is False.
_num_trackers (int) – The number of trackers to use. This number depends on the number of inputs image streams being given to the component. It is set automatically if setup_trackers is True.
init_timeout (int, optional) – The timeout in seconds for the initialization process. Defaults to None.
Example usage:
model = DetectionModel(name='detection1', setup_trackers=True, num_trackers=1, tracking_distance_threshold=20) # Initialize the model for tracking one object