Test Board:S100P。
Performance Data Acquisition: Test a single prompt and record the metrics of TTFT (Time to First Token) and TPS (Average Tokens Per Second).
Python version:Python3.10。
Runtime Environment:Linux。
| model | platform | dtype | seqlen | max context | TTFT(ms) | TPS | memory(GB) |
|---|---|---|---|---|---|---|---|
| Qwen2.5-1.5B | S100P | q8 | 256 | 1024 | 130 | 24.04 | 1.8 |
| Qwen2.5-1.5B-Instruct | S100P | q8 | 256 | 1024 | 130 | 24.40 | 1.8 |
| Qwen2.5-7B | S100P | q8 | 256 | 1024 | 535 | 6.67 | 7.4 |
| Qwen2.5-7B-Instruct | S100P | q8 | 256 | 1024 | 534 | 6.75 | 7.4 |