The oellm_build Tool

The oellm_build tool is provided by D-Robotics to convert floating-point model into quantized model. It completes model quantization and compilation based on the original floating-point model, a json configuration file(optional), and calibration data(optional), and finally generates a deployable *.hbm model.

Usage

usage: oellm_build [-h] --model_name MODEL_NAME --march {nash-e,nash-m} --input_model_path INPUT_MODEL_PATH --output_model_path OUTPUT_MODEL_PATH [--cache_len CACHE_LEN] [--chunk_size CHUNK_SIZE] [--device DEVICE] [--calib_text_path CALIB_TEXT_PATH] [--calib_conversation_path CALIB_CONVERSATION_PATH] Compile a Large Language Model for deployment on hardware. options: -h, --help show this help message and exit --model_name MODEL_NAME Model name. Supported models and their marches: - deepseek-qwen-1_5b: nash-e, nash-m - deepseek-qwen-7b: nash-e, nash-m - qwen2_5-1_5b: nash-e, nash-m - qwen2_5-7b: nash-e, nash-m - internlm2-1_8b: nash-e, nash-m - qwen2_5-omni-3b: nash-e, nash-m --march {nash-e,nash-m} Target hardware architecture for compilation. (Required) --input_model_path INPUT_MODEL_PATH Path to the source model directory. (Required) --output_model_path OUTPUT_MODEL_PATH Path to save the compiled model. (Required) --cache_len CACHE_LEN Maximum sequence length for the KV-cache. (default: 4096). Note: cache_len must be an integer multiple of chunk_size. --chunk_size CHUNK_SIZE Number of tokens per prefill chunk. (default: 256) --device DEVICE Compute device: 'cpu' (default), 'cuda'/'cuda:0', or 'cuda:<index>'. Using CUDA on x86 can accelerate calibration. --calib_text_path CALIB_TEXT_PATH Path to the calibration JSON file or directory of JSON files. (Optional) --calib_conversation_path CALIB_CONVERSATION_PATH Path to the conversation for Qwen2.5-Omni. (Optional)

Parameters Introduction

Parameter NameParameter DescriptionOptional/Required
--model_name

DESCRIPTIONS:This parameter is used to specify the model name.
PARAMETER TYPE:string.
RANGE:deepseek-qwen-1_5bdeepseek-qwen-7bqwen2_5-1_5bqwen2_5-7binternlm2-1_8bqwen2_5-omni-3b.
DEFAULT VALUE:None.

Required
--input_model_path

DESCRIPTIONS:This parameter is used to specify the path to the floating-point model.
PARAMETER TYPE:string.
RANGE:None.
DEFAULT VALUE:None.

Required
--output_model_path

DESCRIPTIONS:This parameter is used to specify the path for saving the model generated after quantization and compilation.
PARAMETER TYPE:string.
RANGE:None.
DEFAULT VALUE:None.

Required
--march

DESCRIPTIONS:This parameter is used to specify the platform architecture to run the board-side deployable model.
PARAMETER TYPE:string.
RANGE:S100-nash-e, S100P-nash-m.
DEFAULT VALUE:None.

Required
--calib_text_path

DESCRIPTIONS:This parameter is used to set the path where the text calibration data is stored. It supports configuring the path to a single JSON file or a folder.
PARAMETER TYPE:string.
RANGE:None.
DEFAULT VALUE:None.
For descriptions/samples of relevant configuration files, please refer to the section JSON Configuration File Description.
WarningIf this parameter is not configured, the default dataset will be used for calibration.

Optional
When the model is Qwen2.5-Omni, the configuration has no effect and does not need to be set.

--calib_conversation_path

DESCRIPTIONS:This parameter is used to specify the path of the calibration data. Only a single JSON file path or a folder path is supported.
PARAMETER TYPE:string.
RANGE:None.
DEFAULT VALUE:None.
For related configuration file instructions/examples, please refer to the JSON Configuration File Description section.
Warning:This parameter can be optionally configured when the model is of a type supported by theS100/S100Pplatform and is aQwen2.5-Omnimodel. If not set, the default dataset will be used for verification.

Optional
--chunk_size

DESCRIPTIONS:This parameter is used to specify the input chunk size.
PARAMETER TYPE:int.
RANGE:[128, 2048].
DEFAULT VALUE:256.
Warning:When the model isQwen2.5-Omni, chunk_size only supports 256.

Optional
--cache_len

DESCRIPTIONS:This parameter is used to set kv cache.
PARAMETER TYPE:int.
RANGE:[256, 4096].
DEFAULT VALUE:4096.
Warning:

  • When the model isQwen2.5-Omni, cache_len only supports 2048.
  • The value of cache_len must be greater than chunk_size and must be an integer multiple of chunk_size.

Optional
--device

DESCRIPTIONS:This parameter is used to specify the computing device to be used.
PARAMETER TYPE:String.
RANGE:'cpu''cuda'(equivalent to 'cuda:0')、'cuda:\<index>'.
DEFAULT VALUE:'cpu'.
Example:When using GPU 1, set as 'cuda:1'.

Optional

JSON Configuration File Description

  1. The JSON configuration file for text calibration data, with a sample shown below:

    [ {"text": "<|User|>Please elaborate on the future development of artificial intelligence.<|Assistant|><think>Analyze in depth from the perspectives of technology, ethics, and industrial development, and predict possible future trends.</think><|Assistant|>"}, {"text": "<|User|>Write a quatrain about spring.<|Assistant|><think>Construct verses around spring imagery such as wind, flowers, birds, and willows.</think><|Assistant|>"} ]
  2. Calibrate data required for the Qwen2.5-Omni model JSON Configuration File Description, with a sample shown below:

    [ { "conversation": [ { "role": "system", "content": [ {"type": "text", "text": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."} ] }, { "role": "user", "content": [ {"type": "text", "text": "Please answer me."}, {"type": "video", "video": "./draw.mp4"}, {"type": "image", "image": "./images/image4.jpg"}, {"type": "audio", "audio": "./audios/audio2.mp3"} ] } ] } ]

Description of configuration file parameters:

(1) When "role" is "system", the first element in the content list must be a text element containing the text field; otherwise, an error will occur when accessing text during template formatting.

Annotation

Multiple system messages are supported within the same conversation, but the first message must be a text element containing the text field. Other system messages (beyond the first) can be of text,audio,image or video.

(2) When "role" is "user", the content list supports types including text,audio,image,video.The specific rules are as follows:

Annotation

Thecontent list supports two message organization formats:

  1. Messages of the same type: Can include a single or multiple items(e.g., multiple text messages, multiple imagemessages, multiple videomessages, multiple audiomessages).

  2. Messages of different types: Multiple types can be combined(e.g., a combination of text+image+audio messages).

  • When the type in the content list is "text":

    • Format restrictions:No special format requirements; plain text, sentences with punctuation, short instructions, and long paragraphs are all supported.

    • Supported sources:No fixed source restrictions.

    • Example reference:

    [ { "conversation": [ { "role": "user", "content": [ {"type": "text", "text": "Please answer me."} ] } ] } ]
  • When the type in the content list is "video":

    • Format restrictions:MP4MKV.

    • Supported sources:Local video files, local file URLs, and web URLs.

    • Example reference:

    [ { "conversation": [ { "role": "user", "content": [ {"type": "video", "video": "./draw.mp4"}, {"type": "video", "video": "file://server/audios/draw.mp4"}, {"type": "video", "video": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-Omni/draw.mp4"} ] } ] } ]
  • When the type in the content list is "image"`:

    • Format restrictions:PNGJPGJPEGBMP.

    • Supported sources:Local image files, local file URLs, web URLs, and Data URI.

    • Example reference:

    [ { "conversation": [ { "role": "user", "content": [ {"type": "image", "image": "./images/image4.jpg"}, {"type": "image", "image": "file://server/audios/image1.jpg"}, {"type": "image", "image": "https://huggingface.co/OpenGVLab/InternVL2-1B/blob/main/examples/image1.jpg"}, {"type": "image", "image": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVQIHWP4//8/AwAI/AL+4J4fNQAAAABJRU5ErkJggg=="} ] } ] } ]
  • When the type in the content list is "audio":

    • Format restrictions:WAVMP3FLAC.

    • Supported sources:Local audio files, local file URLs, web URLs, Data URI.

    • Example reference:

    [ { "conversation": [ { "role": "user", "content": [ {"type": "audio", "audio": "./audios/audio2.mp3"}, {"type": "audio", "audio": "file://server/audios/image1.flac"}, {"type": "audio", "audio": "https://huggingface.co/datasets/hf-internal-testing/dummy-flac-single-example/blob/main/example.flac"}, {"type": "audio", "audio": "data:audio/wav;base64,xxxxx...AA"} ] } ] } ]

Usage Example

  • The DeepSeek-R1-Distill-Qwen model uses the oellm_build tool for model quantization. Refer to the following command:

    oellm_build \ --model_name ${model_name} \ --input_model_path ${input_model_path} \ --output_model_path ${output_model_path} \ --march ${march} \ --calib_text_path ${calib_text_path} \ --chunk_size ${chunk_size} \ --cache_len ${cache_len}
  • The DeepSeek-R1-Distill-Qwen model uses the oellm_build tool for model quantization and perform consistency verification on the quantized HBM model. Refer to the following command:

    oellm_build \ --model_name ${model_name} \ --input_model_path ${input_model_path} \ --output_model_path ${output_model_path} \ --march ${march} \ --calib_text_path ${calib_text_path} \ --chunk_size ${chunk_size} \ --cache_len ${cache_len} \ --verifier \ --remote_ip ${march} \
  • The InternLM2 model uses the oellm_build tool to perform model quantization. Refer to the following command:

    oellm_build \ --model_name ${model_name} \ --input_model_path ${input_model_path} \ --output_model_path ${output_model_path} \ --march ${march} \ --calib_text_path ${calib_text_path} \ --chunk_size ${chunk_size} \ --cache_len ${cache_len}
  • The Qwen2.5 model uses the oellm_build tool to perform model quantization. Refer to the following command:

    oellm_build \ --model_name ${model_name} \ --input_model_path ${input_model_path} \ --output_model_path ${output_model_path} \ --march ${march} \ --calib_text_path ${calib_text_path} \ --chunk_size ${chunk_size} \ --cache_len ${cache_len}
  • The Qwen2.5-Omni model uses the oellm_build tool for model quantization. Refer to the following command:

    oellm_build \ --model_name ${model_name} \ --input_model_path ${input_model_path} \ --output_model_path ${output_model_path} \ --calib_conversation_path ${calib_conversation_path} \ --march ${march} \ --chunk_size ${chunk_size} \ --cache_len ${cache_len}