The oellm_build tool is provided by D-Robotics to convert floating-point model into quantized model. It completes model quantization and compilation based on the original floating-point model, a json configuration file(optional), and calibration data(optional), and finally generates a deployable *.hbm model.
| Parameter Name | Parameter Description | Optional/Required |
--model_name | DESCRIPTIONS:This parameter is used to specify the model name. | Required |
--input_model_path | DESCRIPTIONS:This parameter is used to specify the path to the floating-point model. | Required |
--output_model_path | DESCRIPTIONS:This parameter is used to specify the path for saving the model generated after quantization and compilation. | Required |
--march | DESCRIPTIONS:This parameter is used to specify the platform architecture to run the board-side deployable model. | Required |
--calib_text_path | DESCRIPTIONS:This parameter is used to set the path where the text calibration data is stored.
It supports configuring the path to a single JSON file or a folder. | Optional |
--calib_conversation_path | DESCRIPTIONS:This parameter is used to specify the path of the calibration data. Only a single JSON file path or a folder path is supported. | Optional |
--chunk_size | DESCRIPTIONS:This parameter is used to specify the input chunk size. | Optional |
--cache_len | DESCRIPTIONS:This parameter is used to set kv cache.
| Optional |
--device | DESCRIPTIONS:This parameter is used to specify the computing device to be used. | Optional |
The JSON configuration file for text calibration data, with a sample shown below:
Calibrate data required for the Qwen2.5-Omni model JSON Configuration File Description, with a sample shown below:
Description of configuration file parameters:
(1) When "role" is "system", the first element in the content list must be a text element containing the text field; otherwise, an error will occur when accessing text during template formatting.
Multiple system messages are supported within the same conversation, but the first message must be a text element containing the text field. Other system messages (beyond the first) can be of text,audio,image or video.
(2) When "role" is "user", the content list supports types including text,audio,image,video.The specific rules are as follows:
Thecontent list supports two message organization formats:
Messages of the same type: Can include a single or multiple items(e.g., multiple text messages, multiple imagemessages, multiple videomessages, multiple audiomessages).
Messages of different types: Multiple types can be combined(e.g., a combination of text+image+audio messages).
When the type in the content list is "text":
Format restrictions:No special format requirements; plain text, sentences with punctuation, short instructions, and long paragraphs are all supported.
Supported sources:No fixed source restrictions.
Example reference:
When the type in the content list is "video":
Format restrictions:MP4、MKV.
Supported sources:Local video files, local file URLs, and web URLs.
Example reference:
When the type in the content list is "image"`:
Format restrictions:PNG、JPG、JPEG、BMP.
Supported sources:Local image files, local file URLs, web URLs, and Data URI.
Example reference:
When the type in the content list is "audio":
Format restrictions:WAV、MP3、FLAC.
Supported sources:Local audio files, local file URLs, web URLs, Data URI.
Example reference:
The DeepSeek-R1-Distill-Qwen model uses the oellm_build tool for model quantization. Refer to the following command:
The DeepSeek-R1-Distill-Qwen model uses the oellm_build tool for model quantization and perform consistency verification on the quantized HBM model. Refer to the following command:
The InternLM2 model uses the oellm_build tool to perform model quantization. Refer to the following command:
The Qwen2.5 model uses the oellm_build tool to perform model quantization. Refer to the following command:
The Qwen2.5-Omni model uses the oellm_build tool for model quantization. Refer to the following command: