Quick Start 🚀#

Unlike other ROS package, ROS Agents provides a pure pythonic way of describing the node graph using ROS Sugar. Copy the following code in a python script and run it.

from agents.clients.ollama import OllamaClient
from agents.components import MLLM
from agents.models import Llava
from agents.ros import Topic, Launcher

# Define input and output topics (pay attention to msg_type)
text0 = Topic(name="text0", msg_type="String")
image0 = Topic(name="image_raw", msg_type="Image")
text1 = Topic(name="text1", msg_type="String")

# Define a model client (working with Ollama in this case)
llava = Llava(name="llava")
llava_client = OllamaClient(llava)

# Define an MLLM component (A component represents a node with a particular functionality)
mllm = MLLM(
    inputs=[text0, image0],
    outputs=[text1],
    model_client=llava_client,
    trigger=[text0],
    component_name="vqa"
)
# Additional prompt settings
mllm.set_topic_prompt(text0, template="""You are an amazing and funny robot.
    Answer the following about this image: {{ text0 }}"""
)
# Launch the component
launcher = Launcher()
launcher.add_pkg(components=[mllm])
launcher.bringup()

Now let us see step-by-step what we have done in this code. First we defined inputs and outputs to our component in the form of ROS Topics. Components automatically create listeners for input topics and publishers for output topics.

# Define input and output topics (pay attention to msg_type)
text0 = Topic(name="text0", msg_type="String")
image0 = Topic(name="image_raw", msg_type="Image")
text1 = Topic(name="text1", msg_type="String")

Important

If you are running ROS Agents on a robot, make sure you change the name of the topic to which the robot’s camera is publishing the RGB images to in the following line.

image0 = Topic(name="NAME_OF_THE_TOPIC", msg_type="Image")

Note

If you are running ROS Agents on a testing machine, and the machine has a webcam, you can install the ROS2 USB Cam. Make sure you use the correct name of the image topic as above.

Then we will create a multimodal LLM component. Components are functional units in ROS Agents. To learn more about them, check out Basic Concepts. Other than input/output topics, the MLLM component expects a model client. So first we will create a model client that can utilize a Llava model on Ollama as its model serving platform.

# Define a model client (working with Ollama in this case)
llava = Llava(name="llava")
llava_client = OllamaClient(llava)

Important

If you are not running Ollama on the same machine (robot) on which you are running ROS Agents, you can define access to the machine running Ollama using host and port in this line:

llava_client = OllamaClient(llava, host="127.0.0.1", port=8000)

Note

If the use of Ollama as a model serving platform is unclear, checkout installation instructions.

Now we are ready to setup our component.

# Define an MLLM component (A component represents a node with a particular functionality)
mllm = MLLM(
    inputs=[text0, image0],
    outputs=[text1],
    model_client=llava_client,
    trigger=[text0],
    component_name="vqa"
)
# Additional prompt settings
mllm.set_topic_prompt(text0, template="""You are an amazing and funny robot.
    Answer the following about this image: {{ text0 }}"""
)

Note how the MLLM type of component, also allows us to set a topic or component level prompt, where a jinja2 template can be used to define a template in which our input string should be embedded. Finally we will launch the component.

# Launch the component
launcher = Launcher()
launcher.add_pkg(components=[mllm])
launcher.bringup()

Now we can check that our component is running by using familiar ROS2 commands from a new terminal. We should see our component running as a ROS node and the its input and output topics in the topic list.

ros2 node list
ros2 topic list

In order to interact with our component we can use the tiny web client that is bundled with ROS Agents. We can launch the client by running:

ros2 run automatica_embodied_agents tiny_web_client

The client displays a web UI on http://localhost:8000. Open this address from browser. ROS input and output topic settings for text input and output topics can be configured from the web UI by pressing the settings icon. Send a question to your ROS Agent and you should get a the reply generated by the Llava model.