Models / Vector Databases#
Clients mentioned earlier take as input a model or vector database (DB) specification. These are in most cases generic wrappers around a class of models/dbs (e.g. transformers based LLMs) defined as attrs classes and include initialization parameters such as quantization schemes, inference options, embedding model (in case of vector DBs) etc. These specifications aim to standardize model initialization across diverse deployment platforms.
📚 Available Models
📚 Available Vector DBs
Available Model Wrappers#
Model Name |
Description |
|---|---|
A generic wrapper for LLMs served via OpenAI-compatible |
|
A generic wrapper for Multimodal LLMs (Vision-Language models) served via OpenAI-compatible APIs. Supports image inputs alongside text. This wrapper must be used with the GenericHTTPClient. |
|
A generic wrapper for Text-to-Speech models served via OpenAI-compatible |
|
A generic wrapper for Speech-to-Text models served via OpenAI-compatible |
|
A LLM/VLM model loaded from an Ollama checkpoint. Supports configurable generation and deployment options available in Ollama API. Complete list of Ollama models here. This wrapper must be used with the OllamaClient. |
|
LLM models from HuggingFace/ModelScope based checkpoints. Supports quantization (“4bit”, “8bit”) specification. This model wrapper can be used with the GenericHTTPClient or any of the RoboML clients. |
|
Multimodal LLM models from HuggingFace/ModelScope checkpoints for image-text inputs. Supports quantization. This model wrapper can be used with the GenericHTTPClient or any of the RoboML clients. |
|
LeRobotPolicy Model provides an interface for loading and running LeRobot policies— vision-language-action (VLA) models trained for robotic manipulation tasks. It supports automatic extraction of feature and action specifications directly from dataset metadata, as well as flexible configuration of policy behavior. The policy can be instantiated from any compatible LeRobot checkpoint hosted on HuggingFace, making it easy to load pretrained models such as |
|
RoboBrain 2.0 by BAAI supports interactive reasoning with long-horizon planning and closed-loop feedback, spatial perception for precise point and bbox prediction from complex instructions and temporal perception for future trajectory estimation. Checkpoint defaults to |
|
OpenAI’s automatic speech recognition (ASR) model with various sizes (e.g., |
|
Microsoft’s model for TTS synthesis. Configurable voice selection. This model is available on the RoboML platform and can be used with any RoboML client. Recommended, RoboMLWSClient. |
|
SunoAI’s Bark TTS model. Allows a selection voices. This model is available on the RoboML platform and can be used with any RoboML client. Recommended, RoboMLWSClient. |
|
MyShell’s multilingual TTS model. Configure via |
|
A generic wrapper for object detection and tracking models available on MMDetection framework. Supports optional tracking, configurable thresholds, and deployment with TensorRT. This model is available on the RoboML platform and can be used with any RoboML client. Recommended, RoboMLRESPClient. |
Available Vector Databases#
Vector DB |
Description |
|---|---|
Chroma is an open-source AI application database with support for vector search, full-text search, and multi-modal retrieval. Supports “ollama” and “sentence-transformers” embedding backends. Can be used with the ChomaClient. |
Note
For ChromaDB, make sure you install required packages:
pip install ollama # For Ollama backend (requires Ollama runtime)
pip install sentence-transformers # For Sentence-Transformers backend
To use Ollama embedding models (available models), ensure the Ollama server is running and accessible via specified host and port.