Simple Development

In this chapter, we will introduce the basic usage workflow of D-Robotics-LLM to help you get started quickly. Here, we take the Qwen2.5-Omni-3B model as an example to demonstrate its usage.

Model and Deployment Package Preparation

  • Download the provided D-Robotics_LLM_{version}.tar.gz deployment package and extract it.

  • Download the provided models: Qwen2.5_Omni_3B_Audio.hbm, Qwen2.5_Omni_3B_Visual.hbm, Qwen2.5_Omni_3B_Text.hbm, as well as the input embedding weight file embed_tokens.bin.

Note

For download links to the hbm models, please refer to the resolve_model.txt file located in the model folder of oellm_runtime.

After preparation, integrate the models (*.hbm), the embed_tokens.bin file, and the oellm_runtime SDK from the deployment package. The reference directory structure is as follows:

llm └── D-Robotics_LLM_{version} └── oellm_runtime ├── model │ ├── resolve_model.txt │ ├── Qwen2.5_Omni_3B_Audio.hbm │ ├── Qwen2.5_Omni_3B_Visual.hbm │ ├── Qwen2.5_Omni_3B_Text.hbm │ └── embed_tokens.bin ├── example │ ├── oellm_omni_offline │ │ ├── oellm_omni_offline │ │ ├── omni_offline_config.json │ │ ├── omni_offline_prompt.json │ │ └── draw_guitar.mp4 │ └── oellm_omni_online ├── config ├── include ├── lib └── set_performance_mode.sh

On-Device Runtime Preparation

Create a working directory on the device. Reference commands are as follows:

# Create working directory mkdir -p /home/root/llm cd /home/root/llm

Copy the integrated folder from your development machine to this on-device directory. Reference command:

scp -r llm/* root@{board_ip}:/home/root/llm

Finally, under the path /home/root/llm/D-Robotics_LLM_{version}/oellm_runtime, configure the LD_LIBRARY_PATH. Reference commands:

# Modify hardware register values to switch the device into performance mode sh set_performance_mode.sh # Set environment variable lib=/home/root/llm/D-Robotics_LLM_{version}/oellm_runtime/lib export LD_LIBRARY_PATH=${lib}:${LD_LIBRARY_PATH}

On-Device Execution

The Qwen2.5_Omni_3B model supports both online and offline execution modes. Taking offline mode as an example, the reference command to run the model on the device is as follows:

cd ./example/oellm_omni_offline ./oellm_omni_offline --config ./omni_offline_config.json

In offline mode, the executable accepts the following arguments:

ArgumentDescriptionOptional/Required
-h, --helpDisplay help information./
-c, --configSpecify the path to the JSON configuration file for runtime.Required

Example JSON configuration file:

omni_offline_config.json
{ "visual_hbm_path": "../../model/Qwen2.5_Omni_3B_Visual.hbm", "audio_hbm_path": "../../model/Qwen2.5_Omni_3B_Audio.hbm", "text_hbm_path": "../../model/Qwen2.5_Omni_3B_Text.hbm", "embed_tokens": "../../model/embed_tokens.bin", "tokenizer_dir": "../../config/Qwen2.5_Omni_3B_config/", "model_type": 5, "online_mode": false }

Parameters in the JSON configuration file are described below:

ParameterDescriptionOptional/Required
visual_hbm_pathPath to the quantized video/image feature extraction model file (*.hbm).Required
audio_hbm_pathPath to the quantized audio feature extraction model file (*.hbm).Required
text_hbm_pathPath to the quantized text model file (*.hbm).Required
embed_tokensPath to the model's input embedding weights (embed_tokens.bin).Required
tokenizer_dirPath to the tokenizer and related initialization data configuration.Required
model_typeSpecifies the model type to run; the current Omni model type is 5.Required
online_modeSpecifies whether the model runs in online or offline mode.
Valid values: 'true', 'false'.
Required

Execution Results

After running, you can perform testing as shown below:

xlm init success Offline demo of the on-device Omni multimodal large language model conversation. Please input the path to a JSON file containing input information. Currently supported multimodal input combinations: 1. Text 2. Audio (mp3|wav) 3. Image (jpg|png|bmp|jpeg) 4. Image (jpg|png|bmp|jpeg) + Text 5. Image (jpg|png|bmp|jpeg) + Audio (mp3|wav) 6. Video with audio (mp4|mkv) + Text 7. Video with audio (mp4|mkv) + Audio (mp3|wav) 8. Video without audio (mp4|mkv) 9. Video without audio (mp4|mkv) + Text Example JSON content: {"conversation": [{"role": "system","content": [{"type": "text","text": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."}]},{"role": "user","content": [{"type": "text","text": "Briefly introduce artificial intelligence"}]}]} User input example: [User] <<< ./omni_offline_prompt.json To exit, type 'exit' [User] <<< omni_offline_prompt.json Role: system Type: text Text: "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech." Role: user Type: video Video: "draw_guitar.mp4" VideoPreprocess Time: 1799.69ms Audio(inVideo)Preprocess Time: 214.787ms [Assistant] >>> Oh, that's really cool! You're drawing a guitar on the tablet. Have you been practicing drawing for a long time?If you want to practice more, you can try to draw other things like flowers or animals. It's also a great way to relax and have fun. So, what do you think about it? Performance prefill: 895.73tokens/s decode: 14.03tokens/s

When running the Qwen2.5_Omni_3B model, you need to provide the path to a JSON file, in which you configure inputs such as audio, video, images, and text. During execution, the program prints the input information from the JSON file to the terminal.

In this example, the content of omni_offline_prompt.json is as follows:

omni_offline_prompt.json
{ "conversation": [ { "role": "system", "content": [ { "type": "text", "text": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech." } ] }, { "role": "user", "content": [ { "type": "video", "video": "draw_guitar.mp4", "resized_width": 448, "resized_height": 448 } ] } ] }

For comprehensive details on supported multimodal inputs, specific instructions for online execution mode, and detailed specifications for filling out the JSON files, please refer to the Advanced Development section.