Advanced Development

In this chapter, we will introduce the advanced development workflow of D-Robotics-LLM.

This workflow applies to the following scenarios:

  1. Cases where the model has already been fine-tuned and requires re-quantization.

  2. Simple single-turn conversations.

  3. Calculating the model's Perplexity (PPL) metric on edge devices.

For the above scenarios, we will use the InternLM2-1.8B model as an example to illustrate usage.

Environment Setup

Please ensure you have correctly completed environment setup for both the development host and the development board as described in the Environment Deployment section.

Deployment Package Preparation

Download the provided D-Robotics_LLM_{version}.tar.gz deployment package and extract it.

Model Preparation

Note

Currently, only the InternLM2-1.8B model is supported.
Before downloading the model, please ensure you understand the model's license terms, dependency requirements, and other necessary information to guarantee proper subsequent usage.

You can obtain the InternLM2 series models from the Hugging Face platform. Below are the download links:

Model Quantization

D-Robotics-LLM provides a command-line tool to quantize and compile models for edge-device deployment. Taking the InternLM2-1.8B model as an example, the reference command is as follows:

oellm_build \ --model_name internlm2-1_8b \ --input_model_path ./InternLM2-1.8B \ --output_model_path ./output_hbm \ --march nash-m \ --chunk_size 256 \ --cache_len 1024
Note

For detailed usage instructions and important considerations regarding the oellm_build tool, please refer to the oellm_build Tool section.

If you obtain our pre-compiled .hbm model via the link provided in resolve_model.txt, you may skip this model quantization step.

The InternLM2 model provided in the resolve_model.txt file was compiled with chunk_size set to 256 and cache_len set to 1024.

Edge Device Runtime Preparation

In the directory D-Robotics_LLM_{version}/oellm_runtime/example, we have pre-prepared compiled executables in each subdirectory that can be directly run on the edge device. Alternatively, you can execute different build scripts to generate the required files yourself. Reference commands are as follows:

# Simple conversation sh build_oellm_run.sh # PPL evaluation sh build_oellm_ppl.sh

Next, create a working directory on the edge device with the following command:

# Running on S100/S100P mkdir -p /home/root/llm

Before deploying to the board, ensure you have prepared the following:

  • A functional development board for running programs on the edge device.

  • A deployable model file (*.hbm), which is the output from Model Quantization.

  • Executable files (oellm_run and oellm_ppl).

  • Runtime dependency libraries. To reduce deployment overhead, you can directly use the contents within the D-Robotics-LLM package, including:

    • D-Robotics_LLM_{version}/oellm_runtime/set_performance_mode.sh
    • D-Robotics_LLM_{version}/oellm_runtime/lib folder
    • D-Robotics_LLM_{version}/oellm_runtime/config folder
    • D-Robotics_LLM_{version}/oellm_runtime/example folder

After preparation, integrate the model file (*.hbm), executable files, and dependency libraries into a unified directory structure as shown below:

root@ubuntu:/home/root/llm . |-- model | |-- resolve_model.txt | |-- InternLM2_1.8B_1024.hbm |-- config | |-- InternLM2_1.8B_config |-- example | |-- oellm_run | | |-- oellm_run | |-- oellm_ppl | | |-- oellm_ppl | | |-- internlm2_ppl_config.json | | |-- test-00000-of-00001.bin |-- include |-- lib `--set_performance_mode.sh

Copy the integrated folder from your development host to the edge device using the following command:

scp -r llm/* root@{board_ip}:/home/root/llm

Finally, configure the LD_LIBRARY_PATH under the path /home/root/llm/D-Robotics_LLM_{version}/oellm_runtime with the following commands:

# Modify hardware register values to switch the device into performance mode sh set_performance_mode.sh # Set environment variables lib=/home/root/llm/lib export LD_LIBRARY_PATH=${lib}:${LD_LIBRARY_PATH}

Edge Device Execution

Simple Conversation

Reference command for running on the edge device:

cd ./example/oellm_run ./oellm_run --hbm_path ../../model/InternLM2_1.8B_1024.hbm \ --tokenizer_dir ../../config/InternLM2_1.8B_config/ \ --model_type 4

Runtime parameters are as follows:

ParameterDescriptionOptional/Required
-h, --helpDisplay help information./
--hbm_pathSpecifies the path to the quantized model file (*.hbm).Required
--tokenizer_dirSpecifies the tokenizer configuration directory.Required
--model_typeSpecifies the model type; currently, InternLM uses type 4.Required

PPL Evaluation

Reference command for calculating the model's PPL on the edge device:

cd ./example/oellm_ppl ./oellm_ppl -c ./internlm2_ppl_config.json

Program input parameters are as follows:

ParameterDescriptionOptional/Required
-h, --helpDisplay help information./
-c, --configSpecifies the path to the JSON configuration file.Required

Example JSON configuration file:

internlm2_ppl_config.json
{ "hbm_path": "../../model/InternLM2_1.8B_1024.hbm", "tokenizer_dir": "../../config/InternLM2_1.8B_config/", "model_type": 4, "ppl_testcase": "test-00000-of-00001.bin", "load_ckpt": false, "text_data_num": 0, "max_length": 256, "stride": 100 }

JSON configuration parameters are explained below:

ParameterDescriptionOptional/Required
hbm_pathSpecifies the path to the quantized model file (*.hbm).Required
tokenizer_dirSpecifies the tokenizer configuration directory.Required
model_typeSpecifies the model type; currently, InternLM2 uses type 4.Required
ppl_testcaseSpecifies the test file path; currently supports only bin format.Required
max_lengthSpecifies the sequence length fed into the model per iteration.Required
strideSpecifies the testing stride.Required
bpu_coreSpecifies the BPU core to use. Default is -1 (any core).Optional
load_ckptWhether to resume testing from the last interruption point. Default is false.Optional
text_data_numTruncates input text to a specified length before testing. If text_data_num <= 0, no truncation occurs. Default is 0.Optional

Execution Results

Simple Conversation

Reference for simple dialogue testing:

[User] <<< Today's weather [Assistant] >>> Very clear, sunny, and a gentle breeze on my face. Mom took me for a walk in the park. I saw a mother and her daughter feeding birds, and I asked in surprise, "Where did these birds come from?" The woman picked up a long bamboo pole, lifted a small green caterpillar from the ground, then used a long grass stalk to pick up the insect and placed it into the nest to feed the birds. When I walked over there, my mother kindly said, "Look how cute these little ants are!" My mom and I came to the edge of a large flower bed and saw some children playing joyfully amidst laughter. At that moment, an old man was selling watermelons there! As soon as he sat down, he looked around and suddenly widened his eyes—he saw someone approaching with a very large bag to buy melons. The old man immediately ran over, took out a big watermelon from the bag, handed it to the person, and said, "This is a birthday gift my daughter made for me!" After we returned home, Mom started telling me a story. This story taught me a lesson: No matter where you are, as long as you put in effort, you will reap rewards. Performance prefill: 1855.07 tokens/s decode: 23.83 tokens/s

PPL Evaluation

After PPL evaluation completes, a {ppl_testcase}.json file will be generated in the same directory. The value corresponding to Perplexity in this file is the final PPL test result. See the example below:

{ "PPL Parameters": { "hbm_path": "InternLM2_1.8B_1024.hbm", "ppl_testcase": "test-00000-of-00001.bin", "text_data_num": 0, "max_length": 256, "stride": 100 }, "Average Negative Log-Likelihood": 2.56918, "Perplexity": 13.0552 }
Note
  1. PPL supports resuming evaluation from checkpoints. During execution, the program generates a ppl_ckpt.json file in the working directory. When load_ckpt is set to true, the program reads this file and continues testing from where it left off.

  2. After the PPL program finishes execution, a JSON file containing key parameters and the PPL calculation result of this test will be generated in the directory where the bin test file resides.

  3. The bin file specified by the ppl_testcase parameter can be converted from a parquet file. An example reference script convert_parquet_to_bin.py is provided below:

import pandas as pd from datasets import Dataset df = pd.read_parquet("test-00000-of-00001.parquet") dataset_test = Dataset.from_pandas(df) text_data = "\n\n".join(dataset_test['text']) with open("test-00000-of-00001.bin", "wb") as f: f.write(text_data.encode("utf-8"))