Advanced Development

In this chapter, we will introduce the advanced development workflow of D-Robotics-LLM.

This workflow applies to the following scenarios:

  1. The model has already been fine-tuned and requires re-quantization.

  2. Simple single-turn conversations.

  3. Continuous multi-turn conversations, where the model can remember questions and answers from previous rounds.

  4. Calculating the model's PPL (Perplexity) metric on the edge device.

Qwen2.5 Model Version Selection

We provide both Base and Instruct versions of the Qwen2.5 model to meet your diverse development and application requirements. Their differences are as follows:

  • The Base version is a foundational text generation model suitable for subsequent model training tasks. Model names for this version do not contain the word Instruct.

  • The Instruct version is derived from the Base version through instruction fine-tuning and is better suited for conversational scenarios. Model names for this version include the word Instruct.

Here, we will use the Qwen2.5-1.5B-Instruct model as an example to illustrate usage instructions.
Please note specifically that continuous multi-turn conversation scenarios only support the Instruct version; all other scenarios support both the Base and Instruct versions.

Environment Setup

Please ensure you have correctly completed environment setup for both the development host and development board as described in the Environment Deployment section.

Deployment Package Preparation

Download the provided deployment package D-Robotics_LLM_{version}.tar.gz and extract it.

Model Preparation

Note

Currently, only the following models are supported: Qwen2.5-1.5B, Qwen2.5-7B, Qwen2.5-1.5B-Instruct, and Qwen2.5-7B-Instruct.
Before downloading any model, please ensure you understand its license terms, dependency requirements, and other necessary information to guarantee proper functionality afterward.

You can obtain Qwen2.5 series models from the Hugging Face platform using the links below:

Model Quantization

D-Robotics-LLM provides command-line tools to quantize and compile models for edge-device deployment. Here, we use the Qwen2.5-1.5B-Instruct model as an example. Reference command:

oellm_build \ --model_name qwen2_5-1_5b \ --input_model_path ./Qwen2.5-1.5B-Instruct \ --output_model_path ./output_hbm \ --march nash-m \ --chunk_size 256 \ --cache_len 1024
Note

For detailed usage instructions and important considerations regarding the oellm_build tool, please refer to the oellm_build Tool section.

If you obtain our pre-compiled .hbm models via the links provided in resolve_model.txt, you may skip this quantization step.

All Qwen2.5 models listed in resolve_model.txt were compiled with chunk_size set to 256 and cache_len set to 1024.

Edge Device Runtime Preparation

In the directory D-Robotics_LLM_{version}/oellm_runtime/example, we have pre-prepared compiled executables in subdirectories that can be directly run on the edge device. Alternatively, you can execute different build scripts to generate required files yourself. Reference commands:

# Simple conversation sh build_oellm_run.sh # Multi-turn conversation sh build_oellm_multichat.sh # PPL evaluation sh build_oellm_ppl.sh

Next, create a working directory on the edge device. Reference command:

# Running on S100/S100P mkdir -p /home/root/llm

Before deploying to the board, ensure you have prepared the following:

  • A functional development board for executing programs on the edge device.
  • A deployable model file (*.hbm), which is the output of the Model Quantization step.
  • Executable files (oellm_run, oellm_multichat, and oellm_ppl).
  • Runtime dependency libraries. To reduce deployment overhead, you can directly use the contents from the following paths within the D-Robotics-LLM package:
    • D-Robotics_LLM_{version}/oellm_runtime/set_performance_mode.sh
    • D-Robotics_LLM_{version}/oellm_runtime/lib folder
    • D-Robotics_LLM_{version}/oellm_runtime/config folder
    • D-Robotics_LLM_{version}/oellm_runtime/example folder

After preparation, integrate the model files (*.hbm), executables, and dependency libraries together. Reference directory structure:

root@ubuntu:/home/root/llm . |-- model | |-- resolve_model.txt | |-- Qwen2.5_1.5B_1024.hbm | |-- Qwen2.5_1.5B_Instruct_1024.hbm | |-- Qwen2.5_7B_1024.hbm | |-- Qwen2.5_7B_Instruct_1024.hbm |-- config | |-- Qwen2.5_1.5B_config | |-- Qwen2.5_1.5B_Instruct_config | |-- Qwen2.5_7B_config | |-- Qwen2.5_7B_Instruct_config |-- example | |-- oellm_run | | |-- oellm_run | |-- oellm_multichat | | |-- oellm_multichat | | |-- qwen_multichat_config.json | |-- oellm_ppl | | |-- oellm_ppl | | |-- qwen_ppl_config.json | | |-- test-00000-of-00001.bin |-- include |-- lib `--set_performance_mode.sh

Edge Device Runtime Setup

Create a working directory on the edge device. Reference commands:

# Create working directory mkdir -p /home/root/llm cd /home/root/llm

Copy the integrated folder from your development host to this edge device directory. Reference command:

scp -r llm/* root@{board_ip}:/home/root/llm

Finally, configure LD_LIBRARY_PATH under the path /home/root/llm/D-Robotics_LLM_{version}/oellm_runtime. Reference commands:

# Modify hardware register values to switch the device into performance mode sh set_performance_mode.sh # Set environment variables lib=/home/root/llm/D-Robotics_LLM_{version}/oellm_runtime/lib export LD_LIBRARY_PATH=${lib}:${LD_LIBRARY_PATH}

Running on Edge Device

Simple Conversation

Reference command for running on the edge device:

cd ./example/oellm_run ./oellm_run --hbm_path ../../model/Qwen2.5_1.5B_Instruct_1024.hbm \ --tokenizer_dir ../../config/Qwen2.5_1.5B_Instruct_config/ \ --template_path ../../config/Qwen2.5_1.5B_Instruct_config/Qwen2.5_1.5B_Instruct.jinja \ --model_type 7

Input parameters for the program:

ParameterDescriptionOptional/Required
-h, --helpDisplay help information./
--hbm_pathSpecifies the path to the quantized model file (*.hbm).Required
--tokenizer_dirSpecifies the tokenizer configuration directory.Required
--template_pathSpecifies the conversation template path for Instruct models; omit when loading Base models.Optional
--model_typeSpecifies the model type to run; currently, Qwen2.5 uses model type 7.Required

Multi-turn Conversation

Reference command for running on the edge device:

cd ./example/oellm_multichat ./oellm_multichat -c ./qwen_multichat_config.json

Input parameters for the program:

| Parameter | Description | Optional/Required || -------- | ------------- | ------------ | | -h, --help | Display help information. | / | | -c, --config | Specify the path to the JSON configuration file. | Required |

Example JSON configuration file:

qwen_multichat_config.json
{ "hbm_path": "../../model/Qwen2.5_1.5B_Instruct_1024.hbm", "tokenizer_dir": "../../config/Qwen2.5_1.5B_Instruct_config/", "template_path": "../../config/Qwen2.5_1.5B_Instruct_config/Qwen2.5_1.5B_Instruct.jinja", "model_type": 7 }

Parameter descriptions for the JSON configuration file:

ParameterDescriptionOptional/Required
hbm_pathPath to the quantized model file (*.hbm).Required
tokenizer_dirPath to the tokenizer configuration directory.Required
template_pathPath to the conversation template file.Required
model_typeSpecifies the model type to run; currently, Qwen2.5 uses model type 7.Required
bpu_coreSpecifies the BPU core to use. Default is -1 (any core).Optional

PPL Evaluation

To evaluate the model's Perplexity (PPL) on-device, refer to the following commands:

cd ./example/oellm_ppl ./oellm_ppl -c ./qwen_ppl_config.json

Program input parameters:

ParameterDescriptionOptional/Required
-h, --helpDisplay help information./
-c, --configSpecify the path to the JSON configuration file.Required

Example JSON configuration file:

qwen_ppl_config.json
{ "hbm_path": "../../model/Qwen2.5_1.5B_Instruct_1024.hbm", "tokenizer_dir": "../../config/Qwen2.5_1.5B_Instruct_config/", "model_type": 7, "ppl_testcase": "test-00000-of-00001.bin", "load_ckpt": false, "text_data_num": 0, "max_length": 256, "stride": 100 }

Parameter descriptions for the JSON configuration file:

ParameterDescriptionOptional/Required
hbm_pathPath to the quantized model file (*.hbm).Required
tokenizer_dirPath to the tokenizer configuration directory.Required
model_typeSpecifies the model type to run; currently, Qwen2.5 uses model type 7.Required
ppl_testcasePath to the test file; currently only bin format is supported.Required
max_lengthSequence length fed into the model per evaluation step.Required
strideStride used during testing.Required
bpu_coreSpecifies the BPU core to use. Default is -1 (any core).Optional
load_ckptWhether to resume testing from the last checkpoint if interrupted. Default is false.Optional
text_data_numTruncates the input text to a specific length before testing. If text_data_num <= 0, no truncation occurs. Default is 0.Optional

Execution Results

Simple Conversation

Example of a simple conversation test:

[User] <<< Briefly describe the development of artificial intelligence. [Assistant] >>> The development of Artificial Intelligence (AI) can be briefly described as follows: 1. From Theory to Implementation - AI research began in the 1920s, but significant progress started with Alan Turing's concept of machine intelligence in 1936. Turing introduced the Turing Machine and proposed that a computation is effective if its results are understandable by humans. - Subsequent researchers explored simulating neural networks, machine learning, and endowing machines with intelligence. - In the 1960s, AI entered a period of rapid growth, with mature concepts and techniques emerging, such as genetic algorithms, neural networks, and fuzzy logic. - In the 1990s, deep learning (deep neural networks) gained traction, significantly advancing machine learning—especially in image and pattern recognition. - Recently, AI has achieved rapid progress in speech recognition, natural language processing, image recognition, machine translation, and intelligent robotics. 2. From Theory to Practice - Between 1950 and 2010 (~60 years), AI theory and practice developed rapidly. - From 2010 to 2014, deep learning surged, marking a new era of accelerated AI advancement. By 2014, over 800,000 AI-related papers had been published, with citations exceeding 500 million. - By 2017, deep learning achieved high accuracy in image and speech recognition but faced challenges like massive datasets and model complexity. Research shifted toward addressing data overload and model efficiency. 3. From Development to Maturity - In 2015, AlphaGo defeated world Go champion Lee Sedol, shattering human illusions about AI and signaling technological maturity. - By 2020, AI was widely applied in healthcare, smart manufacturing, and smart cities. Performance prefill: 1939.39 tokens/s decode: 24.43 tokens/s

Multi-turn Conversation

Example of a multi-turn conversation test:

[User] <<< Briefly describe the current state of AI development. [Assistant] >>> The current state of AI development is as follows: 1. AI technologies continue to mature, with advancements in deep learning, reinforcement learning, transfer learning, and models like deep neural networks, CNNs, and RNNs driving continuous algorithmic and application progress. 2. Machine learning and deep learning are increasingly applied in image recognition, speech recognition, and NLP, enabling AI to process vast data and deliver accurate, efficient solutions—evident in medical diagnostics, financial risk control, and intelligent customer service. 3. AI is transforming lifestyles through applications in smart homes, transportation, cities, and healthcare, as well as autonomous vehicles, virtual assistants, and VR, ushering in new ways of living and working. 4. AI delivers significant results across healthcare, education, finance, transportation, manufacturing, energy, retail, and entertainment, evolving into a key driver of innovation. 5. AI is becoming a new engine for global economic growth, with countries worldwide increasing investment to lead future competition. Performance prefill: 1941.70 tokens/s decode: 24.07 tokens/s [User] <<< How can robotics integrate with the aforementioned technologies? [Assistant] >>> Integrating robotics with current AI technologies can further advance AI and unlock innovative applications. Robotics encompasses machine vision, sensors, motion control, and cognition, which can deeply synergize with AI techniques like machine learning, deep learning, NLP, and reinforcement learning. For example: - In robotic vision, deep learning enables precise object detection and tracking. - In motion control, AI algorithms optimize path planning and trajectory execution. - In intelligent perception, machine learning facilitates data processing, pattern recognition, and decision-making. Moreover, combining robotics with AI enhances applications in autonomous driving, smart healthcare, and intelligent finance, making robots more intelligent, automated, and versatile. Performance prefill: 1929.30 tokens/s decode: 24.27 tokens/s [User] <<< Elaborate on the most challenging aspect mentioned above. [Assistant] >>> The most challenging aspect of integrating robotics with AI is likely **intelligent perception and understanding**. This refers to a robot's ability to sense external information (e.g., vision, audio, touch) and comprehend its meaning. Key challenges include: 1. **Object Recognition Accuracy**: Robots must accurately identify objects considering shape, color, material, dynamics, lighting, and background—all affecting reliability. 2. **Semantic Understanding**: Robots need to infer an object’s purpose, function, or state by analyzing shape, size, color, and texture, while interacting with humans via language, vision, or audio. 3. **Intelligent Decision-Making**: Robots must autonomously decide actions based on environmental context, task goals, capabilities, constraints, and risk assessment. Performance prefill: 1940.90 tokens/s decode: 24.14 tokens/s

PPL Evaluation

After PPL evaluation completes, a {ppl_testcase}.json file is generated in the same directory. The value under Perplexity represents the final PPL result. Example:

{ "PPL Parameters": { "hbm_path": "../../model/Qwen2.5_1.5B_1024.hbm", "ppl_testcase": "test-00000-of-00001.bin", "text_data_num": 0, "max_length": 256, "stride": 100 }, "Average Negative Log-Likelihood": 2.5058889819220984, "Perplexity": 12.254448107791829 }
Note
  1. PPL evaluation supports resuming from checkpoints. During execution, a ppl_ckpt.json file is created in the working directory. If load_ckpt is set to true, the program resumes testing from this checkpoint.

  2. After PPL evaluation finishes, a JSON file containing key test parameters and the PPL result is generated in the directory of the bin test file.

  3. The bin file specified by ppl_testcase can be converted from a parquet file. Below is reference code (convert_parquet_to_bin.py) for conversion:

import pandas as pd from datasets import Dataset df = pd.read_parquet("test-00000-of-00001.parquet") dataset_test = Dataset.from_pandas(df) text_data = "\n\n".join(dataset_test['text']) with open("test-00000-of-00001.bin", "wb") as f: f.write(text_data.encode("utf-8"))