In this chapter, we will introduce the advanced development workflow of D-Robotics-LLM.
This workflow applies to the following scenarios:
Cases where the model has already been fine-tuned and requires re-quantization.
Simple single-turn conversations.
Calculating the model's Perplexity (PPL) metric on edge devices.
For the above scenarios, we will use the InternLM2-1.8B model as an example to illustrate usage.
Please ensure you have correctly completed environment setup for both the development host and the development board as described in the Environment Deployment section.
Download the provided D-Robotics_LLM_{version}.tar.gz deployment package and extract it.
Currently, only the InternLM2-1.8B model is supported.
Before downloading the model, please ensure you understand the model's license terms, dependency requirements, and other necessary information to guarantee proper subsequent usage.
You can obtain the InternLM2 series models from the Hugging Face platform. Below are the download links:
D-Robotics-LLM provides a command-line tool to quantize and compile models for edge-device deployment. Taking the InternLM2-1.8B model as an example, the reference command is as follows:
For detailed usage instructions and important considerations regarding the oellm_build tool, please refer to the oellm_build Tool section.
If you obtain our pre-compiled .hbm model via the link provided in resolve_model.txt, you may skip this model quantization step.
The InternLM2 model provided in the resolve_model.txt file was compiled with chunk_size set to 256 and cache_len set to 1024.
In the directory D-Robotics_LLM_{version}/oellm_runtime/example, we have pre-prepared compiled executables in each subdirectory that can be directly run on the edge device. Alternatively, you can execute different build scripts to generate the required files yourself. Reference commands are as follows:
Next, create a working directory on the edge device with the following command:
Before deploying to the board, ensure you have prepared the following:
A functional development board for running programs on the edge device.
A deployable model file (*.hbm), which is the output from Model Quantization.
Executable files (oellm_run and oellm_ppl).
Runtime dependency libraries. To reduce deployment overhead, you can directly use the contents within the D-Robotics-LLM package, including:
D-Robotics_LLM_{version}/oellm_runtime/set_performance_mode.shD-Robotics_LLM_{version}/oellm_runtime/lib folderD-Robotics_LLM_{version}/oellm_runtime/config folderD-Robotics_LLM_{version}/oellm_runtime/example folderAfter preparation, integrate the model file (*.hbm), executable files, and dependency libraries into a unified directory structure as shown below:
Copy the integrated folder from your development host to the edge device using the following command:
Finally, configure the LD_LIBRARY_PATH under the path /home/root/llm/D-Robotics_LLM_{version}/oellm_runtime with the following commands:
Reference command for running on the edge device:
Runtime parameters are as follows:
| Parameter | Description | Optional/Required |
|---|---|---|
-h, --help | Display help information. | / |
--hbm_path | Specifies the path to the quantized model file (*.hbm). | Required |
--tokenizer_dir | Specifies the tokenizer configuration directory. | Required |
--model_type | Specifies the model type; currently, InternLM uses type 4. | Required |
Reference command for calculating the model's PPL on the edge device:
Program input parameters are as follows:
| Parameter | Description | Optional/Required |
|---|---|---|
-h, --help | Display help information. | / |
-c, --config | Specifies the path to the JSON configuration file. | Required |
Example JSON configuration file:
JSON configuration parameters are explained below:
| Parameter | Description | Optional/Required |
|---|---|---|
hbm_path | Specifies the path to the quantized model file (*.hbm). | Required |
tokenizer_dir | Specifies the tokenizer configuration directory. | Required |
model_type | Specifies the model type; currently, InternLM2 uses type 4. | Required |
ppl_testcase | Specifies the test file path; currently supports only bin format. | Required |
max_length | Specifies the sequence length fed into the model per iteration. | Required |
stride | Specifies the testing stride. | Required |
bpu_core | Specifies the BPU core to use. Default is -1 (any core). | Optional |
load_ckpt | Whether to resume testing from the last interruption point. Default is false. | Optional |
text_data_num | Truncates input text to a specified length before testing. If text_data_num <= 0, no truncation occurs. Default is 0. | Optional |
Reference for simple dialogue testing:
After PPL evaluation completes, a {ppl_testcase}.json file will be generated in the same directory. The value corresponding to Perplexity in this file is the final PPL test result. See the example below:
PPL supports resuming evaluation from checkpoints. During execution, the program generates a ppl_ckpt.json file in the working directory. When load_ckpt is set to true, the program reads this file and continues testing from where it left off.
After the PPL program finishes execution, a JSON file containing key parameters and the PPL calculation result of this test will be generated in the directory where the bin test file resides.
The bin file specified by the ppl_testcase parameter can be converted from a parquet file. An example reference script convert_parquet_to_bin.py is provided below: