Simple Development
In this chapter, we will introduce the basic usage workflow of D-Robotics-LLM to help you get started quickly. Here, we take the Qwen2.5-Omni-3B model as an example to demonstrate its usage.
Model and Deployment Package Preparation
-
Download the provided D-Robotics_LLM_{version}.tar.gz deployment package and extract it.
-
Download the provided models: Qwen2.5_Omni_3B_Audio.hbm, Qwen2.5_Omni_3B_Visual.hbm, Qwen2.5_Omni_3B_Text.hbm, as well as the input embedding weight file embed_tokens.bin.
Note
For download links to the hbm models, please refer to the resolve_model.txt file located in the model folder of oellm_runtime.
After preparation, integrate the models (*.hbm), the embed_tokens.bin file, and the oellm_runtime SDK from the deployment package. The reference directory structure is as follows:
llm
└── D-Robotics_LLM_{version}
└── oellm_runtime
├── model
│ ├── resolve_model.txt
│ ├── Qwen2.5_Omni_3B_Audio.hbm
│ ├── Qwen2.5_Omni_3B_Visual.hbm
│ ├── Qwen2.5_Omni_3B_Text.hbm
│ └── embed_tokens.bin
├── example
│ ├── oellm_omni_offline
│ │ ├── oellm_omni_offline
│ │ ├── omni_offline_config.json
│ │ ├── omni_offline_prompt.json
│ │ └── draw_guitar.mp4
│ └── oellm_omni_online
├── config
├── include
├── lib
└── set_performance_mode.sh
On-Device Runtime Preparation
Create a working directory on the device. Reference commands are as follows:
# Create working directory
mkdir -p /home/root/llm
cd /home/root/llm
Copy the integrated folder from your development machine to this on-device directory. Reference command:
scp -r llm/* root@{board_ip}:/home/root/llm
Finally, under the path /home/root/llm/D-Robotics_LLM_{version}/oellm_runtime, configure the LD_LIBRARY_PATH. Reference commands:
# Modify hardware register values to switch the device into performance mode
sh set_performance_mode.sh
# Set environment variable
lib=/home/root/llm/D-Robotics_LLM_{version}/oellm_runtime/lib
export LD_LIBRARY_PATH=${lib}:${LD_LIBRARY_PATH}
On-Device Execution
The Qwen2.5_Omni_3B model supports both online and offline execution modes. Taking offline mode as an example, the reference command to run the model on the device is as follows:
cd ./example/oellm_omni_offline
./oellm_omni_offline --config ./omni_offline_config.json
In offline mode, the executable accepts the following arguments:
| Argument | Description | Optional/Required |
|---|
-h, --help | Display help information. | / |
-c, --config | Specify the path to the JSON configuration file for runtime. | Required |
Example JSON configuration file:
omni_offline_config.json
{
"visual_hbm_path": "../../model/Qwen2.5_Omni_3B_Visual.hbm",
"audio_hbm_path": "../../model/Qwen2.5_Omni_3B_Audio.hbm",
"text_hbm_path": "../../model/Qwen2.5_Omni_3B_Text.hbm",
"embed_tokens": "../../model/embed_tokens.bin",
"tokenizer_dir": "../../config/Qwen2.5_Omni_3B_config/",
"model_type": 5,
"online_mode": false
}
Parameters in the JSON configuration file are described below:
| Parameter | Description | Optional/Required |
|---|
visual_hbm_path | Path to the quantized video/image feature extraction model file (*.hbm). | Required |
audio_hbm_path | Path to the quantized audio feature extraction model file (*.hbm). | Required |
text_hbm_path | Path to the quantized text model file (*.hbm). | Required |
embed_tokens | Path to the model's input embedding weights (embed_tokens.bin). | Required |
tokenizer_dir | Path to the tokenizer and related initialization data configuration. | Required |
model_type | Specifies the model type to run; the current Omni model type is 5. | Required |
online_mode | Specifies whether the model runs in online or offline mode. Valid values: 'true', 'false'. | Required |
Execution Results
After running, you can perform testing as shown below:
xlm init success
Offline demo of the on-device Omni multimodal large language model conversation. Please input the path to a JSON file containing input information.
Currently supported multimodal input combinations:
1. Text
2. Audio (mp3|wav)
3. Image (jpg|png|bmp|jpeg)
4. Image (jpg|png|bmp|jpeg) + Text
5. Image (jpg|png|bmp|jpeg) + Audio (mp3|wav)
6. Video with audio (mp4|mkv) + Text
7. Video with audio (mp4|mkv) + Audio (mp3|wav)
8. Video without audio (mp4|mkv)
9. Video without audio (mp4|mkv) + Text
Example JSON content:
{"conversation": [{"role": "system","content": [{"type": "text","text": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."}]},{"role": "user","content": [{"type": "text","text": "Briefly introduce artificial intelligence"}]}]}
User input example:
[User] <<< ./omni_offline_prompt.json
To exit, type 'exit'
[User] <<< omni_offline_prompt.json
Role: system
Type: text
Text: "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."
Role: user
Type: video
Video: "draw_guitar.mp4"
VideoPreprocess Time: 1799.69ms
Audio(inVideo)Preprocess Time: 214.787ms
[Assistant] >>> Oh, that's really cool! You're drawing a guitar on the tablet. Have you been practicing drawing for a long time?If you want to practice more, you can try to draw other things like flowers or animals. It's also a great way to relax and have fun. So, what do you think about it?
Performance prefill: 895.73tokens/s decode: 14.03tokens/s
When running the Qwen2.5_Omni_3B model, you need to provide the path to a JSON file, in which you configure inputs such as audio, video, images, and text. During execution, the program prints the input information from the JSON file to the terminal.
In this example, the content of omni_offline_prompt.json is as follows:
omni_offline_prompt.json
{
"conversation": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."
}
]
},
{
"role": "user",
"content": [
{
"type": "video",
"video": "draw_guitar.mp4",
"resized_width": 448,
"resized_height": 448
}
]
}
]
}
For comprehensive details on supported multimodal inputs, specific instructions for online execution mode, and detailed specifications for filling out the JSON files, please refer to the Advanced Development section.