Advanced Development
In this chapter, we will introduce the advanced development workflow of D-Robotics-LLM.
This workflow applies to the following scenarios:
-
The model has already been fine-tuned and requires re-quantization.
-
Simple single-turn conversations.
-
Continuous multi-turn conversations, where the model can remember questions and answers from previous rounds.
-
Calculating the model's Perplexity (PPL) metric on the edge device.
For the above scenarios, we will use the DeepSeek-R1-Distill-Qwen-1.5B model as an example to illustrate usage.
Environment Setup
Please ensure you have correctly completed environment setup for both the development host and development board according to the Environment Deployment section.
Deployment Package Preparation
Download the provided D-Robotics_LLM_{version}.tar.gz deployment package and extract it.
Model Preparation
Note
Currently, only the DeepSeek-R1-Distill-Qwen-1.5B and DeepSeek-R1-Distill-Qwen-7B models are supported.
Before downloading a model, please ensure you understand its license terms, dependency requirements, and other necessary information to guarantee proper subsequent usage.
You can obtain DeepSeek series models via the Hugging Face platform. Below are the download links:
Model Quantization
D-Robotics-LLM provides command-line tools to quantize and compile models for edge devices. Here, we demonstrate using the DeepSeek-R1-Distill-Qwen-1.5B model with the following reference command:
oellm_build \
--model_name deepseek-qwen-1_5b \
--input_model_path ./DeepSeek-R1-Distill-Qwen-1.5B \
--output_model_path ./output_hbm \
--march nash-m \
--chunk_size 256 \
--cache_len 4096
Note
For detailed usage instructions and precautions regarding the oellm_build tool, please refer to the oellm_build Tool section.
If you obtain our pre-compiled .hbm models via the links provided in resolve_model.txt, you may skip this quantization step.
All DeepSeek models listed in resolve_model.txt were compiled with chunk_size=256 and two different cache_len configurations (1024 and 4096). This difference is reflected in their filenames.
Edge Device Runtime Preparation
In the directory D-Robotics_LLM_{version}/oellm_runtime/example, we have pre-prepared compiled executables in respective subdirectories that can be directly run on the edge device. Alternatively, you can execute different build scripts to generate required files yourself. Reference commands are as follows:
# Simple conversation
sh build_oellm_run.sh
# Multi-turn conversation
sh build_oellm_multichat.sh
# PPL evaluation
sh build_oellm_ppl.sh
Next, create a working directory on the edge device with the following command:
# Running on S100/S100P
mkdir -p /home/root/llm
Before deploying to the board, ensure you have prepared the following:
- A functional development board for running programs.
- A deployable model file (
*.hbm), which is the output from Model Quantization.
- Executable files (
oellm_run, oellm_multichat, and oellm_ppl).
- Runtime dependency libraries. To reduce deployment overhead, you can directly use the contents from the following directories within the D-Robotics-LLM package:
D-Robotics_LLM_{version}/oellm_runtime/set_performance_mode.sh
D-Robotics_LLM_{version}/oellm_runtime/lib
D-Robotics_LLM_{version}/oellm_runtime/config
D-Robotics_LLM_{version}/oellm_runtime/example
After preparation, organize your model files (*.hbm), executables, and dependencies into a unified directory structure as shown below:
root@ubuntu:/home/root/llm
.
|-- model
| |-- resolve_model.txt
| |-- DeepSeek_R1_Distill_Qwen_1.5B_1024.hbm
| |-- DeepSeek_R1_Distill_Qwen_1.5B_1024_q4.hbm
| |-- DeepSeek_R1_Distill_Qwen_1.5B_4096.hbm
| |-- DeepSeek_R1_Distill_Qwen_1.5B_4096_q4.hbm
| |-- DeepSeek_R1_Distill_Qwen_7B_1024.hbm
|-- config
| |-- DeepSeek_R1_Distill_Qwen_1.5B_config
| |-- DeepSeek_R1_Distill_Qwen_7B_config
|-- example
| |-- oellm_run
| | |-- oellm_run
| |-- oellm_multichat
| | |-- oellm_multichat
| | |-- deepseek_multichat_config.json
| |-- oellm_ppl
| | |-- oellm_ppl
| | |-- deepseek_ppl_config.json
| | |-- test-00000-of-00001.bin
|-- include
|-- lib
`--set_performance_mode.sh
Copy the integrated folder from your development host to the edge device directory using the following command:
scp -r llm/* root@{board_ip}:/home/root/llm
Finally, configure LD_LIBRARY_PATH under /home/root/llm/D-Robotics_LLM_{version}/oellm_runtime with the following commands:
# Modify hardware registers to switch the device to performance mode
sh set_performance_mode.sh
# Set environment variables
lib=/home/root/llm/lib
export LD_LIBRARY_PATH=${lib}:${LD_LIBRARY_PATH}
Running on Edge Device
Simple Conversation
Reference command for running on the edge device:
cd ./example/oellm_run
./oellm_run --hbm_path ../../model/DeepSeek_R1_Distill_Qwen_1.5B_4096.hbm \
--tokenizer_dir ../../config/DeepSeek_R1_Distill_Qwen_1.5B_config/ \
--template_path ../../config/DeepSeek_R1_Distill_Qwen_1.5B_config/DeepSeek_R1_Distill_Qwen_1.5B.jinja \
--model_type 1
Program input parameters are as follows:
| Parameter | Description | Optional/Required |
|---|
-h, --help | Display help information. | / |
--hbm_path | Path to the quantized model file (*.hbm). | Required |
--tokenizer_dir | Path to tokenizer configuration. | Required |
--template_path | Path to conversation template. | Required |
--model_type | Specifies the model type; currently, DeepSeek models use type 1. | Required |
Multi-turn Conversation
Reference command for running on the edge device:
cd ./example/oellm_multichat
./oellm_multichat -c ./deepseek_multichat_config.json
Program input parameters are as follows:
| Parameter | Description | Optional/Required |
|---|
-h, --help | Display help information. | / |
-c, --config | Path to the JSON configuration file. | Required |
Example JSON configuration file:
deepseek_multichat_config.json
{
"hbm_path": "../../model/DeepSeek_R1_Distill_Qwen_1.5B_4096.hbm",
"tokenizer_dir": "../../config/deepseek_config/",
"template_path": "../../config/deepseek_config/DeepSeek-R1-Distill-Qwen-1.5B.jinja",
"model_type": 1
}
JSON configuration parameters are described below:
| Parameter | Description | Optional/Required |
|---|
hbm_path | Path to the quantized model file (*.hbm). | Required |
tokenizer_dir | Path to tokenizer configuration. | Required |
template_path | Path to conversation template. | Required |
bpu_core | Specifies the BPU core to use. Default value is -1, meaning any core. | Optional |
PPL Evaluation
To evaluate the model's PPL (Perplexity) on the device, refer to the following commands:
cd ./example/oellm_ppl
./oellm_ppl -c ./deepseek_ppl_config.json
The program accepts the following input arguments:
| Argument | Description | Optional/Required |
|---|
-h, --help | Display help information. | / |
-c, --config | Specify the path to the JSON configuration file. | Required |
Example of the JSON configuration file:
deepseek_ppl_config.json
{
"hbm_path": "../../model/DeepSeek_R1_Distill_Qwen_1.5B_4096.hbm",
"tokenizer_dir": "../../config/DeepSeek_R1_Distill_Qwen_1.5B_config/",
"model_type": 1,
"ppl_testcase": "test-00000-of-00001.bin",
"load_ckpt": false,
"text_data_num": 0,
"max_length": 256,
"stride": 100
}
Descriptions of the JSON configuration parameters are as follows:
| Parameter | Description | Optional/Required |
|---|
hbm_path | Specifies the path to the quantized model file (*.hbm). | Required |
tokenizer_dir | Specifies the path to the tokenizer configuration. | Required |
model_type | Specifies the model type to run. The current DeepSeek model type is 1. | Required |
ppl_testcase | Specifies the path to the test file. Currently, only bin format is supported. | Required |
max_length | Specifies the sequence length fed into the model each time. | Required |
stride | Specifies the stride length used during evaluation. | Required |
bpu_core | Specifies the BPU core to use. Default value is -1, meaning any core. | Optional |
load_ckpt | Whether to resume testing from the last interruption point. Default value is false. | Optional |
text_data_num | Specifies truncating the input text to a specific length before testing. If text_data_num <= 0, no truncation occurs. Default value is 0. | Optional |
Execution Results
Simple Conversation
Example of a simple conversation test:
[User] <<< Briefly describe the development of artificial intelligence.
[Assistant] >>> <|begin▁of▁sentence|>
The development of Artificial Intelligence (AI) can be divided into several key stages:
1. **Early AI**:
- **Artificial Intelligence**: Initially applied to specific tasks such as gaming and customer service.
- **Machine Learning**: In the 1950s, computers began learning tasks like automatic recognition and speech recognition.
- **Expert Systems**: In the 1970s, systems like “MYCIN” simulated human experts.
2. **Computer Vision**:
- **Image Recognition**: In the 1980s, computers recognized simple images, such as handwritten digits.
- **Natural Language Processing**: In the 1990s, systems performed tasks like automated Wikipedia search and editing.
3. **Deep Learning**:
- **Neural Networks**: In the 1980s, neural networks started processing complex data.
- **Convolutional Neural Networks (CNNs)**: In the 1990s, CNNs were used for image recognition, e.g., in autonomous vehicles.
- **Deep Learning**: In the 2010s, models like GPT and BERT revolutionized natural language processing.
4. **Reinforcement Learning**:
- **Robotics Control**: In the 1980s, robots learned actions through trial and error.
- **Autonomous Driving**: In the 2010s, reinforcement learning enabled self-driving cars.
5. **Deep Learning and Neural Networks**:
- **Image Recognition**: Including classification, segmentation, and generation.
- **Natural Language Processing**: Including text generation, translation, and dialogue.
- **Speech Recognition**: Including transcription and speech synthesis.
6. **AI Applications**:
- **Healthcare**: Such as diagnostics and drug discovery.
- **Transportation**: Including autonomous driving and traffic management systems.
- **Education**: Such as intelligent tutoring systems.
- **Finance**: Including algorithmic trading and risk management.
7. **Ethics and Challenges**:
- **Privacy Issues**: Data breaches and privacy violations.
- **Ethical Concerns**: Algorithmic bias and fairness issues.
8. **Future Outlook**:
- **AI Chips**: Dedicated hardware for training and inference.
- **Edge AI**: Running AI directly on devices to reduce data transmission.
- **Multimodal AI**: Integrating vision, audio, and other modalities.
- **Human Assistants**: Including chatbots and life-support systems.
AI will continue advancing across multiple domains, driving technological progress and societal transformation.
Performance prefill: 2348.62 tokens/s decode: 27.08 tokens/s
Multi-turn Dialogue
Example of a multi-turn dialogue test:
[User] <<< Briefly introduce the current state of AI development.
[Assistant] >>> <|begin▁of▁sentence|>
AI technology has entered a phase of rapid advancement. From basic speech recognition to complex image understanding, from natural language processing to reinforcement learning, the boundaries of AI continue expanding. These technologies have been widely adopted in healthcare, education, finance, and many other fields. Meanwhile, issues concerning AI explainability and ethics are increasingly drawing attention. In the future, as technology progresses further, AI will evolve toward greater intelligence and human-like interaction.
Performance prefill: 2347.63 tokens/s decode: 27.06 tokens/s
[User] <<< How can robotics integrate with this field?
[Assistant] >>> <|begin▁of▁sentence|>
The integration of AI and robotics enables smarter and more efficient solutions. For example, robots can perform complex tasks in specialized domains—such as surgical simulation in healthcare or automated manufacturing in industry. AI can optimize robotic behaviors to enhance efficiency, while robots effectively execute decisions generated by AI. This synergy will expand robotics into broader application areas. At the same time, ethical considerations must be carefully addressed throughout this process. Through collaborative efforts between AI and robotics, humans can better navigate complex and uncertain environments, achieving more efficient and intelligent outcomes.
Performance prefill: 2338.42 tokens/s decode: 27.01 tokens/s
PPL Evaluation
After completing the PPL evaluation, a {ppl_testcase}.json file will be generated in the same directory. The value corresponding to Perplexity in this file represents the final PPL result. Example output:
{
"PPL Parameters": {
"hbm_path": "DeepSeek_R1_Distill_Qwen_7B_1024.hbm",
"ppl_testcase": "test-00000-of-00001.bin",
"text_data_num": 0,
"max_length": 256,
"stride": 100
},
"Average Negative Log-Likelihood": 3.5173572962692465,
"Perplexity": 33.69526409810056
}
Note
-
PPL evaluation supports resuming from checkpoints. During execution, the program generates a ppl_ckpt.json file in the working directory. When load_ckpt is set to true, the program reads this file and resumes testing from the interruption point.
-
After the PPL program finishes execution, a JSON file containing key test parameters and the PPL result will be generated in the same directory as the bin test file.
-
The bin file specified by the ppl_testcase parameter can be converted from a parquet file. An example script convert_parquet_to_bin.py is provided below:
import pandas as pd
from datasets import Dataset
df = pd.read_parquet("test-00000-of-00001.parquet")
dataset_test = Dataset.from_pandas(df)
text_data = "\n\n".join(dataset_test['text'])
with open("test-00000-of-00001.bin", "wb") as f:
f.write(text_data.encode("utf-8"))