How to quantize model

[中文]

ESP-DL must use a proprietary format .espdl for model deployment. This is a quantized model format that supports 8bit and 16bit. In this tutorial, we will take quantize_sin_model as an example to show how to use ESP-PPQ to quantize and export a .espdl model. The quantization method is Post Training Quantization (PTQ).

Preparation

Install ESP_PPQ

Pre-trained model

python sin_model.py

Run sin_model.py . This script trains a simple Pytorch model to fit the sin function in the range [0, 2pi]. After training, the corresponding .pth weights will be saved and the ONNX model will be exported.

Note

ESP-PPQ provides two interfaces, espdl_quantize_onnx and espdl_quantize_torch, to support ONNX models and PyTorch models. Other deep learning frameworks, such as TensorfFlow, PaddlePaddle, etc., need to be converted to ONNX first.

Quantize and export .espdl

Reference quantize_torch_model.py and quantize_onnx_model.py , learn how to use the espdl_quantize_onnx and espdl_quantize_torch interfaces to quantize and export the .espdl model.

After executing the script, three files will be exported:

  • **.espdl: ESPDL model binary file, which can be directly used for chip reasoning.

  • **.info: ESPDL model text file, used to debug and determine whether the .espdl model is exported correctly. Contains model structure, quantized model weights, test input/output and other information.

  • **.json: Quantization information file, used to save and load quantization information.

Note

  1. The .espdl models of different platforms cannot be mixed, otherwise the inference results will be inaccurate. The ROUND strategy used by ESP32S3 is ROUND_HALF_UP, and the one used by ESP32P4 is ROUND_HALF_EVEN.

  2. The quantization strategy currently used by ESP-DL is symmetric quantization + POWER OF TWO.

Add test input/output

To verify whether the inference results of the model on the board are correct, you first need to record a set of test input/output on the PC. By turning on the export_test_values option in the api, a set of test input/output can be saved in the .espdl model. One of the input_shape and inputs parameters must be specified. The input_shape parameter uses a random test input, while inputs can use a specific test input. The values ​​of the test input/output can be viewed in the .info file. Search for test inputs value and test outputs value to view them.

Quantized model inference & accuracy evaluation

espdl_quantize_onnx and espdl_quantize_torch APIs will return BaseGraph. Use BaseGraph to build the corresponding TorchExecutor to use the quantized model for inference on the PC side.

executor = TorchExecutor(graph=quanted_graph, device=device)
output = executor(input)

The output obtained by quantized model inference can be used to calculate various accuracy metrics. Since the board-side esp-dl inference result can be aligned with esp-ppq, these metrics can be used directly to evaluate the accuracy of the quantized model.

Note

  1. Currently esp-dl only supports batch_size of 1, and does not support multi-batch or dynamic batch.

  2. The test input/output and the quantized model weights in the .info file are all 16-byte aligned. If the length is less than 16 bytes, it will be padded with 0.

Advanced Quantization Methods

If you want to further improve the performance of the quantized model, please try the the following advanced quantization methods:

Post Training Quantization (PTQ)

Quantization Aware Training (QAT)