Models / Vector Databases

Models / Vector Databases#

Clients mentioned earlier take as input a model or vector database (DB) specification. These are in most cases generic wrappers around a class of models/dbs (e.g. transformers based LLMs) defined as attrs classes and include initialization parameters such as quantization schemes, inference options, embedding model (in case of vector DBs) etc. These specifications aim to standardize model initialization across diverse deployment platforms.

Available Model Wrappers#

Model Name

Description

GenericLLM

A generic wrapper for LLMs served via OpenAI-compatible /v1/chat/completions APIs (e.g., vLLM, LMDeploy, OpenAI). Supports configurable inference options like temperature and max tokens. This wrapper must be used with the GenericHTTPClient.

GenericMLLM

A generic wrapper for Multimodal LLMs (Vision-Language models) served via OpenAI-compatible APIs. Supports image inputs alongside text. This wrapper must be used with the GenericHTTPClient.

GenericTTS

A generic wrapper for Text-to-Speech models served via OpenAI-compatible /v1/audio/speech APIs. Supports voice selection (voice), speed (speed) configuration. This wrapper must be used with the GenericHTTPClient.

GenericSTT

A generic wrapper for Speech-to-Text models served via OpenAI-compatible /v1/audio/transcriptions APIs. Supports language hints (language) and temperature settings. This wrapper must be used with the GenericHTTPClient.

OllamaModel

A LLM/VLM model loaded from an Ollama checkpoint. Supports configurable generation and deployment options available in Ollama API. Complete list of Ollama models here. This wrapper must be used with the OllamaClient.

TransformersLLM

LLM models from HuggingFace/ModelScope based checkpoints. Supports quantization (“4bit”, “8bit”) specification. This model wrapper can be used with the GenericHTTPClient or any of the RoboML clients.

TransformersMLLM

Multimodal LLM models from HuggingFace/ModelScope checkpoints for image-text inputs. Supports quantization. This model wrapper can be used with the GenericHTTPClient or any of the RoboML clients.

LeRobotPolicy

LeRobotPolicy Model provides an interface for loading and running LeRobot policies— vision-language-action (VLA) models trained for robotic manipulation tasks. It supports automatic extraction of feature and action specifications directly from dataset metadata, as well as flexible configuration of policy behavior. The policy can be instantiated from any compatible LeRobot checkpoint hosted on HuggingFace, making it easy to load pretrained models such as smolvla_base or others from LeRobot. This wrapper must be used with the GRPC based LeRobotClient.

RoboBrain2

RoboBrain 2.0 by BAAI supports interactive reasoning with long-horizon planning and closed-loop feedback, spatial perception for precise point and bbox prediction from complex instructions and temporal perception for future trajectory estimation. Checkpoint defaults to "BAAI/RoboBrain2.0-7B", with larger variants available here. This wrapper can be used with any of the RoboML clients.

Whisper

OpenAI’s automatic speech recognition (ASR) model with various sizes (e.g., "small", "large-v3", etc.). These models are available on the RoboML platform and can be used with any RoboML client. Recommended, RoboMLWSClient.

SpeechT5

Microsoft’s model for TTS synthesis. Configurable voice selection. This model is available on the RoboML platform and can be used with any RoboML client. Recommended, RoboMLWSClient.

Bark

SunoAI’s Bark TTS model. Allows a selection voices. This model is available on the RoboML platform and can be used with any RoboML client. Recommended, RoboMLWSClient.

MeloTTS

MyShell’s multilingual TTS model. Configure via language (e.g., "JP") and speaker_id (e.g., "JP-1"). This model is available on the RoboML platform and can be used with any RoboML client. Recommended, RoboMLWSClient.

VisionModel

A generic wrapper for object detection and tracking models available on MMDetection framework. Supports optional tracking, configurable thresholds, and deployment with TensorRT. This model is available on the RoboML platform and can be used with any RoboML client. Recommended, RoboMLRESPClient.

Available Vector Databases#

Vector DB

Description

ChromaDB

Chroma is an open-source AI application database with support for vector search, full-text search, and multi-modal retrieval. Supports “ollama” and “sentence-transformers” embedding backends. Can be used with the ChomaClient.

Note

For ChromaDB, make sure you install required packages:

pip install ollama  # For Ollama backend (requires Ollama runtime)
pip install sentence-transformers  # For Sentence-Transformers backend

To use Ollama embedding models (available models), ensure the Ollama server is running and accessible via specified host and port.