How to quantize model
ESP-DL must use a proprietary format .espdl
for model deployment. This is a quantized model format that supports 8bit and 16bit. In this tutorial, we will take quantize_sin_model as an example to show how to use ESP-PPQ to quantize and export a .espdl
model. The quantization method is Post Training Quantization (PTQ).
Preparation
Pre-trained model
python sin_model.py
Run sin_model.py . This script trains a simple Pytorch model to fit the sin function in the range [0, 2pi]. After training, the corresponding .pth weights will be saved and the ONNX model will be exported.
Note
ESP-PPQ provides two interfaces, espdl_quantize_onnx
and espdl_quantize_torch
, to support ONNX models and PyTorch models.
Other deep learning frameworks, such as TensorfFlow, PaddlePaddle, etc., need to be converted to ONNX first.
Convert TensorFlow to ONNX tf2onnx
Convert TFLite to ONNX tflite2onnx
Convert TFLite to TensorFlow tflite2tensorflow
Convert PaddlePaddle to ONNX paddle2onnx
Quantize and export .espdl
Reference quantize_torch_model.py and quantize_onnx_model.py , learn how to use the espdl_quantize_onnx
and espdl_quantize_torch
interfaces to quantize and export the .espdl
model.
After executing the script, three files will be exported:
**.espdl
: ESPDL model binary file, which can be directly used for chip reasoning.**.info
: ESPDL model text file, used to debug and determine whether the.espdl
model is exported correctly. Contains model structure, quantized model weights, test input/output and other information.**.json
: Quantization information file, used to save and load quantization information.
Note
The
.espdl
models of different platforms cannot be mixed, otherwise the inference results will be inaccurate. The ROUND strategy used byESP32S3
isROUND_HALF_UP
, and the one used byESP32P4
isROUND_HALF_EVEN
.The quantization strategy currently used by ESP-DL is symmetric quantization + POWER OF TWO.
Add test input/output
To verify whether the inference results of the model on the board are correct, you first need to record a set of test input/output on the PC. By turning on the export_test_values
option in the api, a set of test input/output can be saved in the .espdl
model. One of the input_shape
and inputs
parameters must be specified. The input_shape
parameter uses a random test input, while inputs
can use a specific test input. The values of the test input/output can be viewed in the .info
file. Search for test inputs value
and test outputs value
to view them.
Quantized model inference & accuracy evaluation
espdl_quantize_onnx
and espdl_quantize_torch
APIs will return BaseGraph
. Use BaseGraph
to build the corresponding TorchExecutor
to use the quantized model for inference on the PC side.
executor = TorchExecutor(graph=quanted_graph, device=device)
output = executor(input)
The output obtained by quantized model inference can be used to calculate various accuracy metrics. Since the board-side esp-dl
inference result can be aligned with esp-ppq
, these metrics can be used directly to evaluate the accuracy of the quantized model.
Note
Currently esp-dl only supports batch_size of 1, and does not support multi-batch or dynamic batch.
The test input/output and the quantized model weights in the
.info
file are all 16-byte aligned. If the length is less than 16 bytes, it will be padded with 0.
Advanced Quantization Methods
If you want to further improve the performance of the quantized model, please try the the following advanced quantization methods: