How to deploy YOLO11n
In this tutorial, we will introduce how to quantize a pre-trained YOLO11n model using ESP-PPQ and deploy the quantized YOLO11n model using ESP-DL.
Preparation
Model quantization
Pre-trained Model
You can download pre-trained yolo11n model from Ultralytics release.
Currently, ESP-PPQ supports ONNX, PyTorch, and TensorFlow models. During the quantization process, PyTorch and TensorFlow models are first converted to ONNX models, so the pre-trained yolo11n model needs to be converted to an ONNX model.
Specificially, refer to the script export_onnx.py to convert the pre-trained yolo11n model to an ONNX model.
In the srcipt, we have overridden the forward method of the Detect class, which offers following advantages:
Faster inference. Compared to the original yolo11n model, operations related to decoding bounding boxes in Detect head are moved from the inference pass to the post-processing phase, resulting in a significant reduction in inference latency. On one hand, operations like
Conv
,Transpose
,Slice
,Split
andConcat
are time-consuming when applied during inference pass. On the other hand, the inference outputs are first filtered using a score threshold before decoding the boxes in the post-processing pass, which significantly reduces the number of calculations, thereby acclerating the overall inference speed.Lower quantization Error. The
Concat
andAdd
operators adopt joint quantization in ESP-PPQ. To reduce quantization errors, the box and score are output by separate branches, rather than being concatenated, due to the significant difference in their ranges. Similarly, since the ranges of the two inputs ofAdd
andSub
differ significantly, the calculations are performed in the post-processing phase to avoid quantization errors.
Calibration Dataset
The calibration dataset needs to match the input format of the model. The calibration dataset should cover all possible input scenarios to better quantize the model. Here, the calibration dataset used in this example is calib_yolo11n.
8bit default configuration quantization
Quantization settings
target="esp32p4"
num_of_bits=8
batch_size=32
quant_setting = QuantizationSettingFactory.espdl_setting() # default setting
Quantization results
Layer | NOISE:SIGNAL POWER RATIO
/model.10/m/m.0/ffn/ffn.1/conv/Conv: | ████████████████████ | 36.042%
/model.10/m/m.0/attn/proj/conv/Conv: | ████████████████ | 28.761%
/model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | █████████████ | 22.876%
/model.23/cv2.2/cv2.2.0/conv/Conv: | ████████████ | 21.570%
/model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | ████████████ | 21.467%
/model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | ████████████ | 21.021%
/model.23/cv2.2/cv2.2.1/conv/Conv: | ████████████ | 20.973%
/model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | ███████████ | 19.432%
/model.22/m.0/cv2/conv/Conv: | ███████████ | 19.320%
/model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | ███████████ | 19.243%
/model.22/m.0/cv3/conv/Conv: | ███████████ | 19.029%
/model.22/cv2/conv/Conv: | ██████████ | 18.488%
/model.22/m.0/m/m.1/cv2/conv/Conv: | ██████████ | 18.222%
/model.23/cv2.1/cv2.1.1/conv/Conv: | ██████████ | 17.400%
/model.8/m.0/cv2/conv/Conv: | █████████ | 16.189%
/model.23/cv2.0/cv2.0.1/conv/Conv: | █████████ | 15.585%
/model.10/m/m.0/attn/pe/conv/Conv: | ████████ | 14.687%
/model.10/m/m.0/attn/qkv/conv/Conv: | ████████ | 14.601%
/model.23/cv2.1/cv2.1.0/conv/Conv: | ████████ | 14.154%
/model.22/cv1/conv/Conv: | ████████ | 14.102%
/model.10/m/m.0/attn/MatMul_1: | ████████ | 13.998%
/model.10/cv1/conv/Conv: | ███████ | 13.560%
/model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | ██████ | 11.771%
/model.19/m.0/cv2/conv/Conv: | ██████ | 11.216%
/model.22/m.0/m/m.0/cv2/conv/Conv: | ██████ | 11.140%
/model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | ██████ | 11.057%
/model.13/m.0/cv2/conv/Conv: | ██████ | 10.881%
/model.20/conv/Conv: | ██████ | 10.692%
/model.23/cv2.2/cv2.2.2/Conv: | █████ | 9.888%
/model.10/cv2/conv/Conv: | █████ | 9.788%
/model.8/cv2/conv/Conv: | █████ | 9.477%
/model.8/m.0/cv1/conv/Conv: | █████ | 9.422%
/model.19/cv2/conv/Conv: | █████ | 9.102%
/model.8/cv1/conv/Conv: | █████ | 9.101%
/model.8/m.0/cv3/conv/Conv: | █████ | 9.068%
/model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | █████ | 9.014%
/model.22/m.0/m/m.0/cv1/conv/Conv: | █████ | 8.996%
/model.6/m.0/cv2/conv/Conv: | █████ | 8.882%
/model.22/m.0/m/m.1/cv1/conv/Conv: | █████ | 8.637%
/model.13/cv2/conv/Conv: | █████ | 8.556%
/model.8/m.0/m/m.0/cv1/conv/Conv: | █████ | 8.461%
/model.8/m.0/m/m.0/cv2/conv/Conv: | █████ | 8.362%
/model.19/cv1/conv/Conv: | ████ | 8.194%
/model.8/m.0/m/m.1/cv1/conv/Conv: | ████ | 8.021%
/model.13/cv1/conv/Conv: | ████ | 7.910%
/model.10/m/m.0/attn/MatMul: | ████ | 7.861%
/model.19/m.0/cv1/conv/Conv: | ████ | 7.520%
/model.22/m.0/cv1/conv/Conv: | ████ | 7.239%
/model.8/m.0/m/m.1/cv2/conv/Conv: | ████ | 7.054%
/model.23/cv2.0/cv2.0.0/conv/Conv: | ████ | 7.042%
/model.13/m.0/cv1/conv/Conv: | ████ | 6.987%
/model.23/cv2.0/cv2.0.2/Conv: | ████ | 6.739%
/model.23/cv2.1/cv2.1.2/Conv: | ████ | 6.734%
/model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | ████ | 6.660%
/model.17/conv/Conv: | ███ | 6.025%
/model.16/m.0/cv2/conv/Conv: | ███ | 5.897%
/model.6/cv2/conv/Conv: | ███ | 5.815%
/model.6/m.0/cv3/conv/Conv: | ███ | 5.814%
/model.6/cv1/conv/Conv: | ███ | 5.693%
/model.7/conv/Conv: | ███ | 5.570%
/model.9/cv2/conv/Conv: | ███ | 5.382%
/model.10/m/m.0/ffn/ffn.0/conv/Conv: | ███ | 5.173%
/model.6/m.0/m/m.0/cv1/conv/Conv: | ███ | 5.168%
/model.16/m.0/cv1/conv/Conv: | ███ | 5.087%
/model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | ███ | 5.010%
/model.16/cv2/conv/Conv: | ███ | 4.991%
/model.2/cv2/conv/Conv: | ██ | 4.552%
/model.6/m.0/m/m.0/cv2/conv/Conv: | ██ | 4.443%
/model.3/conv/Conv: | ██ | 4.318%
/model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | ██ | 4.304%
/model.6/m.0/m/m.1/cv1/conv/Conv: | ██ | 3.968%
/model.5/conv/Conv: | ██ | 3.948%
/model.6/m.0/cv1/conv/Conv: | ██ | 3.863%
/model.4/cv1/conv/Conv: | ██ | 3.720%
/model.2/cv1/conv/Conv: | ██ | 3.565%
/model.4/cv2/conv/Conv: | ██ | 3.538%
/model.16/cv1/conv/Conv: | ██ | 3.110%
/model.2/m.0/cv2/conv/Conv: | █ | 2.844%
/model.6/m.0/m/m.1/cv2/conv/Conv: | █ | 2.762%
/model.4/m.0/cv1/conv/Conv: | █ | 2.532%
/model.9/cv1/conv/Conv: | █ | 2.015%
/model.4/m.0/cv2/conv/Conv: | █ | 1.761%
/model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | █ | 1.317%
/model.1/conv/Conv: | █ | 1.315%
/model.23/cv3.2/cv3.2.2/Conv: | █ | 1.114%
/model.2/m.0/cv1/conv/Conv: | | 0.731%
/model.23/cv3.1/cv3.1.2/Conv: | | 0.491%
/model.23/cv3.0/cv3.0.2/Conv: | | 0.282%
/model.0/conv/Conv: | | 0.159%
Analysing Layerwise quantization error:: 100%|██| 89/89 [07:46<00:00, 5.24s/it]
Layer | NOISE:SIGNAL POWER RATIO
/model.1/conv/Conv: | ████████████████████ | 0.384%
/model.22/cv1/conv/Conv: | █████████████ | 0.247%
/model.4/cv2/conv/Conv: | ████████████ | 0.233%
/model.2/cv2/conv/Conv: | ██████████ | 0.201%
/model.0/conv/Conv: | ██████████ | 0.192%
/model.9/cv2/conv/Conv: | ████████ | 0.156%
/model.10/cv1/conv/Conv: | ███████ | 0.132%
/model.3/conv/Conv: | ██████ | 0.108%
/model.4/cv1/conv/Conv: | ████ | 0.074%
/model.16/cv1/conv/Conv: | ███ | 0.066%
/model.2/cv1/conv/Conv: | ███ | 0.060%
/model.23/cv2.0/cv2.0.0/conv/Conv: | ███ | 0.052%
/model.2/m.0/cv1/conv/Conv: | ██ | 0.044%
/model.6/cv1/conv/Conv: | ██ | 0.033%
/model.10/m/m.0/attn/pe/conv/Conv: | ██ | 0.029%
/model.2/m.0/cv2/conv/Conv: | █ | 0.028%
/model.22/m.0/m/m.0/cv1/conv/Conv: | █ | 0.023%
/model.16/cv2/conv/Conv: | █ | 0.021%
/model.16/m.0/cv2/conv/Conv: | █ | 0.020%
/model.19/m.0/cv1/conv/Conv: | █ | 0.020%
/model.4/m.0/cv1/conv/Conv: | █ | 0.018%
/model.19/cv2/conv/Conv: | █ | 0.017%
/model.4/m.0/cv2/conv/Conv: | █ | 0.016%
/model.10/m/m.0/attn/qkv/conv/Conv: | █ | 0.016%
/model.19/cv1/conv/Conv: | █ | 0.015%
/model.13/cv2/conv/Conv: | █ | 0.015%
/model.8/cv1/conv/Conv: | █ | 0.013%
/model.23/cv2.1/cv2.1.0/conv/Conv: | █ | 0.013%
/model.23/cv2.2/cv2.2.1/conv/Conv: | █ | 0.012%
/model.13/cv1/conv/Conv: | █ | 0.012%
/model.10/cv2/conv/Conv: | █ | 0.011%
/model.13/m.0/cv1/conv/Conv: | █ | 0.011%
/model.6/cv2/conv/Conv: | █ | 0.011%
/model.13/m.0/cv2/conv/Conv: | █ | 0.010%
/model.5/conv/Conv: | | 0.010%
/model.19/m.0/cv2/conv/Conv: | | 0.009%
/model.6/m.0/m/m.1/cv1/conv/Conv: | | 0.009%
/model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | | 0.008%
/model.23/cv2.2/cv2.2.0/conv/Conv: | | 0.008%
/model.23/cv2.1/cv2.1.1/conv/Conv: | | 0.008%
/model.9/cv1/conv/Conv: | | 0.008%
/model.23/cv2.0/cv2.0.1/conv/Conv: | | 0.007%
/model.16/m.0/cv1/conv/Conv: | | 0.007%
/model.17/conv/Conv: | | 0.007%
/model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | | 0.007%
/model.10/m/m.0/ffn/ffn.1/conv/Conv: | | 0.007%
/model.23/cv2.0/cv2.0.2/Conv: | | 0.006%
/model.8/m.0/cv1/conv/Conv: | | 0.006%
/model.23/cv2.2/cv2.2.2/Conv: | | 0.005%
/model.23/cv2.1/cv2.1.2/Conv: | | 0.005%
/model.22/m.0/cv3/conv/Conv: | | 0.005%
/model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | | 0.005%
/model.7/conv/Conv: | | 0.005%
/model.8/cv2/conv/Conv: | | 0.004%
/model.22/cv2/conv/Conv: | | 0.004%
/model.6/m.0/cv3/conv/Conv: | | 0.004%
/model.10/m/m.0/ffn/ffn.0/conv/Conv: | | 0.004%
/model.8/m.0/m/m.1/cv2/conv/Conv: | | 0.004%
/model.22/m.0/m/m.1/cv1/conv/Conv: | | 0.004%
/model.8/m.0/m/m.1/cv1/conv/Conv: | | 0.004%
/model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | | 0.003%
/model.10/m/m.0/attn/proj/conv/Conv: | | 0.003%
/model.22/m.0/m/m.0/cv2/conv/Conv: | | 0.003%
/model.22/m.0/cv1/conv/Conv: | | 0.003%
/model.8/m.0/cv3/conv/Conv: | | 0.003%
/model.6/m.0/m/m.0/cv1/conv/Conv: | | 0.003%
/model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | | 0.003%
/model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | | 0.002%
/model.6/m.0/m/m.1/cv2/conv/Conv: | | 0.002%
/model.8/m.0/m/m.0/cv2/conv/Conv: | | 0.002%
/model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | | 0.002%
/model.10/m/m.0/attn/MatMul_1: | | 0.002%
/model.22/m.0/m/m.1/cv2/conv/Conv: | | 0.001%
/model.6/m.0/m/m.0/cv2/conv/Conv: | | 0.001%
/model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | | 0.001%
/model.8/m.0/m/m.0/cv1/conv/Conv: | | 0.001%
/model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | | 0.001%
/model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | | 0.001%
/model.6/m.0/cv1/conv/Conv: | | 0.001%
/model.23/cv3.2/cv3.2.2/Conv: | | 0.001%
/model.20/conv/Conv: | | 0.001%
/model.23/cv3.1/cv3.1.2/Conv: | | 0.001%
/model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | | 0.001%
/model.6/m.0/cv2/conv/Conv: | | 0.001%
/model.23/cv3.0/cv3.0.2/Conv: | | 0.000%
/model.10/m/m.0/attn/MatMul: | | 0.000%
/model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | | 0.000%
/model.8/m.0/cv2/conv/Conv: | | 0.000%
/model.22/m.0/cv2/conv/Conv: | | 0.000%
Quantization error analysis
With the same inputs, The mAP50:95 on COCO val2017 after quantization is only 30.8%, which is lower than that of the float model. There is a accuracy loss with:
Graphwise Error
The output layers of the model are /model.23/cv3.2/cv3.2.2/Conv, /model.23/cv2.2/cv2.2.2/Conv, /model.23/cv3.1/cv3.1.2/Conv, /model.23/cv2.1/cv2.1.2/Conv, /model.23/cv3.0/cv3.0.2/Conv and /model.23/cv2.0/cv2.0.2/Conv. The cumulative error for these layers are 1.114%, 9.888%, 0.491%, 6.734%, 0.282% and 6.739% respectively. Generally, if the cumulative error of the output layer is less than 10%, the loss in accuracy of the quantized model is minimal.
Layerwise error
Observing the Layerwise error, it is found that the errors for all layers are below 1%, indicating that the quantization errors for all layers are small.
We noticed that although the layer-wise errors for all layers are small, the cumulative errors in some layers are relatively large. This may be related to the complex CSP structure in the yolo11n model, where the inputs to the Concat
or Add
layers may have different distributions or scales. We can choose to quantize certain layers using int16 and optimize the quantization with horizontal layer split pass. For more details, please refer to the mixed-precision + horizontal layer split pass quantization test.
Mixed-Precision + Horizontal Layer Split Pass Quantization
Quantization settings
from ppq.api import get_target_platform
target="esp32p4"
num_of_bits=8
batch_size=32
# Quantize the following layers with 16-bits
quant_setting = QuantizationSettingFactory.espdl_setting()
quant_setting.dispatching_table.append("/model.2/cv2/conv/Conv", get_target_platform(TARGET, 16))
quant_setting.dispatching_table.append("/model.3/conv/Conv", get_target_platform(TARGET, 16))
quant_setting.dispatching_table.append("/model.4/cv2/conv/Conv", get_target_platform(TARGET, 16))
# Horizontal Layer Split Pass
quant_setting.weight_split = True
quant_setting.weight_split_setting.method = 'balance'
quant_setting.weight_split_setting.value_threshold = 1.5
quant_setting.weight_split_setting.interested_layers = ['/model.0/conv/Conv', '/model.1/conv/Conv']
Quantization results
Layer | NOISE:SIGNAL POWER RATIO
/model.10/m/m.0/ffn/ffn.1/conv/Conv: | ████████████████████ | 24.841%
/model.10/m/m.0/attn/proj/conv/Conv: | ███████████████ | 19.061%
/model.23/cv2.2/cv2.2.1/conv/Conv: | ██████████████ | 17.927%
/model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | ██████████████ | 17.396%
/model.23/cv2.2/cv2.2.0/conv/Conv: | ██████████████ | 17.061%
/model.22/m.0/cv3/conv/Conv: | ████████████ | 15.563%
/model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | ████████████ | 15.427%
/model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | ████████████ | 14.890%
/model.22/m.0/m/m.1/cv2/conv/Conv: | ████████████ | 14.784%
/model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | ███████████ | 14.243%
/model.22/cv2/conv/Conv: | ███████████ | 14.098%
/model.22/m.0/cv2/conv/Conv: | ███████████ | 13.945%
/model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | ███████████ | 13.489%
/model.23/cv2.1/cv2.1.1/conv/Conv: | █████████ | 10.919%
/model.23/cv2.0/cv2.0.1/conv/Conv: | ████████ | 10.073%
/model.23/cv2.1/cv2.1.0/conv/Conv: | ████████ | 9.819%
/model.22/cv1/conv/Conv: | ███████ | 9.093%
/model.10/m/m.0/attn/MatMul_1: | ███████ | 8.414%
/model.22/m.0/m/m.0/cv2/conv/Conv: | ███████ | 8.245%
/model.23/cv2.2/cv2.2.2/Conv: | ███████ | 8.208%
/model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | ██████ | 8.031%
/model.10/m/m.0/attn/qkv/conv/Conv: | ██████ | 7.818%
/model.13/m.0/cv2/conv/Conv: | ██████ | 7.717%
/model.19/m.0/cv2/conv/Conv: | ██████ | 7.404%
/model.20/conv/Conv: | ██████ | 7.161%
/model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | ██████ | 7.080%
/model.10/m/m.0/attn/pe/conv/Conv: | █████ | 6.814%
/model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | █████ | 6.764%
/model.22/m.0/m/m.1/cv1/conv/Conv: | █████ | 6.539%
/model.22/m.0/m/m.0/cv1/conv/Conv: | █████ | 6.418%
/model.19/cv2/conv/Conv: | █████ | 6.206%
/model.13/cv2/conv/Conv: | █████ | 5.894%
/model.10/cv1/conv/Conv: | █████ | 5.757%
/model.10/cv2/conv/Conv: | █████ | 5.716%
/model.19/cv1/conv/Conv: | ████ | 5.279%
/model.22/m.0/cv1/conv/Conv: | ████ | 5.072%
/model.19/m.0/cv1/conv/Conv: | ████ | 5.036%
/model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | ████ | 4.979%
/model.8/m.0/cv2/conv/Conv: | ████ | 4.862%
/model.10/m/m.0/attn/MatMul: | ████ | 4.670%
/model.13/cv1/conv/Conv: | ████ | 4.594%
/model.23/cv2.0/cv2.0.0/conv/Conv: | ████ | 4.441%
/model.23/cv2.0/cv2.0.2/Conv: | ███ | 4.308%
/model.13/m.0/cv1/conv/Conv: | ███ | 4.278%
/model.23/cv2.1/cv2.1.2/Conv: | ███ | 4.214%
/model.6/m.0/cv2/conv/Conv: | ███ | 4.031%
/model.17/conv/Conv: | ███ | 3.760%
/model.16/m.0/cv2/conv/Conv: | ███ | 3.521%
/model.8/m.0/cv1/conv/Conv: | ███ | 3.227%
/model.16/m.0/cv1/conv/Conv: | ██ | 3.185%
/model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | ██ | 3.178%
/model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | ██ | 3.150%
/model.8/cv2/conv/Conv: | ██ | 3.067%
/model.8/m.0/cv3/conv/Conv: | ██ | 3.067%
/model.16/cv2/conv/Conv: | ██ | 3.054%
/model.2/cv2/conv/Conv: | ██ | 3.053%
/model.8/m.0/m/m.1/cv1/conv/Conv: | ██ | 3.049%
/model.6/m.0/cv3/conv/Conv: | ██ | 3.049%
/model.8/cv1/conv/Conv: | ██ | 2.984%
/model.8/m.0/m/m.0/cv2/conv/Conv: | ██ | 2.934%
/model.10/m/m.0/ffn/ffn.0/conv/Conv: | ██ | 2.794%
/model.6/cv1/conv/Conv: | ██ | 2.783%
/model.8/m.0/m/m.0/cv1/conv/Conv: | ██ | 2.753%
/model.2/cv1/conv/Conv: | ██ | 2.697%
/model.6/cv2/conv/Conv: | ██ | 2.616%
/model.8/m.0/m/m.1/cv2/conv/Conv: | ██ | 2.596%
/model.9/cv2/conv/Conv: | ██ | 2.500%
/model.3/conv/Conv: | ██ | 2.499%
/model.2/m.0/cv2/conv/Conv: | ██ | 2.469%
/model.6/m.0/m/m.0/cv2/conv/Conv: | ██ | 2.235%
/model.6/m.0/m/m.0/cv1/conv/Conv: | ██ | 2.233%
/model.4/cv2/conv/Conv: | ██ | 2.150%
/model.7/conv/Conv: | ██ | 2.075%
/model.6/m.0/m/m.1/cv1/conv/Conv: | ██ | 2.069%
/model.5/conv/Conv: | ██ | 1.998%
/model.16/cv1/conv/Conv: | █ | 1.899%
/model.4/cv1/conv/Conv: | █ | 1.808%
/model.4/m.0/cv1/conv/Conv: | █ | 1.741%
/model.6/m.0/cv1/conv/Conv: | █ | 1.734%
/model.6/m.0/m/m.1/cv2/conv/Conv: | █ | 1.523%
/model.4/m.0/cv2/conv/Conv: | █ | 1.248%
/model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | █ | 0.875%
/model.23/cv3.2/cv3.2.2/Conv: | █ | 0.784%
/model.1/conv/Conv: | █ | 0.781%
PPQ_Operation_2: | | 0.698%
/model.9/cv1/conv/Conv: | | 0.680%
/model.2/m.0/cv1/conv/Conv: | | 0.508%
/model.23/cv3.1/cv3.1.2/Conv: | | 0.360%
/model.23/cv3.0/cv3.0.2/Conv: | | 0.189%
PPQ_Operation_0: | | 0.110%
/model.0/conv/Conv: | | 0.099%
Analysing Layerwise quantization error:: 100%|██| 91/91 [12:32<00:00, 8.27s/it]
Layer | NOISE:SIGNAL POWER RATIO
/model.22/cv1/conv/Conv: | ████████████████████ | 0.244%
/model.9/cv2/conv/Conv: | █████████████ | 0.156%
/model.10/cv1/conv/Conv: | ███████████ | 0.132%
/model.1/conv/Conv: | ██████ | 0.077%
/model.4/cv1/conv/Conv: | ██████ | 0.074%
/model.16/cv1/conv/Conv: | █████ | 0.066%
/model.0/conv/Conv: | █████ | 0.061%
/model.2/cv1/conv/Conv: | █████ | 0.060%
/model.23/cv2.0/cv2.0.0/conv/Conv: | ████ | 0.052%
PPQ_Operation_0: | ████ | 0.047%
/model.2/m.0/cv1/conv/Conv: | ████ | 0.045%
/model.10/m/m.0/attn/pe/conv/Conv: | ██ | 0.029%
/model.2/m.0/cv2/conv/Conv: | ██ | 0.029%
/model.6/cv1/conv/Conv: | ██ | 0.025%
/model.22/m.0/m/m.0/cv1/conv/Conv: | ██ | 0.023%
/model.16/cv2/conv/Conv: | ██ | 0.021%
/model.16/m.0/cv2/conv/Conv: | ██ | 0.020%
/model.19/m.0/cv1/conv/Conv: | ██ | 0.020%
/model.4/m.0/cv1/conv/Conv: | █ | 0.018%
/model.19/cv2/conv/Conv: | █ | 0.017%
/model.4/m.0/cv2/conv/Conv: | █ | 0.016%
/model.10/m/m.0/attn/qkv/conv/Conv: | █ | 0.016%
/model.19/cv1/conv/Conv: | █ | 0.015%
/model.13/cv2/conv/Conv: | █ | 0.015%
/model.23/cv2.1/cv2.1.0/conv/Conv: | █ | 0.013%
/model.23/cv2.2/cv2.2.1/conv/Conv: | █ | 0.012%
/model.13/cv1/conv/Conv: | █ | 0.012%
/model.6/cv2/conv/Conv: | █ | 0.011%
/model.13/m.0/cv1/conv/Conv: | █ | 0.011%
/model.8/cv1/conv/Conv: | █ | 0.010%
/model.13/m.0/cv2/conv/Conv: | █ | 0.010%
/model.5/conv/Conv: | █ | 0.010%
/model.6/m.0/m/m.1/cv1/conv/Conv: | █ | 0.009%
/model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | █ | 0.008%
/model.23/cv2.2/cv2.2.0/conv/Conv: | █ | 0.008%
/model.23/cv2.1/cv2.1.1/conv/Conv: | █ | 0.008%
/model.19/m.0/cv2/conv/Conv: | █ | 0.008%
/model.8/cv2/conv/Conv: | █ | 0.008%
/model.9/cv1/conv/Conv: | █ | 0.008%
/model.23/cv2.0/cv2.0.1/conv/Conv: | █ | 0.007%
/model.16/m.0/cv1/conv/Conv: | █ | 0.007%
/model.17/conv/Conv: | █ | 0.007%
/model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | █ | 0.007%
/model.10/m/m.0/ffn/ffn.1/conv/Conv: | █ | 0.007%
/model.22/m.0/cv1/conv/Conv: | | 0.006%
/model.10/cv2/conv/Conv: | | 0.006%
/model.23/cv2.0/cv2.0.2/Conv: | | 0.006%
/model.23/cv2.2/cv2.2.2/Conv: | | 0.005%
/model.23/cv2.1/cv2.1.2/Conv: | | 0.005%
/model.22/m.0/cv3/conv/Conv: | | 0.005%
/model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | | 0.005%
/model.22/cv2/conv/Conv: | | 0.005%
/model.7/conv/Conv: | | 0.004%
/model.6/m.0/cv3/conv/Conv: | | 0.004%
/model.10/m/m.0/ffn/ffn.0/conv/Conv: | | 0.004%
/model.8/m.0/m/m.1/cv2/conv/Conv: | | 0.004%
/model.22/m.0/m/m.1/cv1/conv/Conv: | | 0.004%
/model.8/m.0/m/m.1/cv1/conv/Conv: | | 0.004%
/model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | | 0.003%
/model.8/m.0/cv1/conv/Conv: | | 0.003%
/model.10/m/m.0/attn/proj/conv/Conv: | | 0.003%
/model.22/m.0/m/m.0/cv2/conv/Conv: | | 0.003%
PPQ_Operation_2: | | 0.003%
/model.8/m.0/cv3/conv/Conv: | | 0.003%
/model.6/m.0/m/m.0/cv1/conv/Conv: | | 0.003%
/model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | | 0.002%
/model.6/m.0/m/m.1/cv2/conv/Conv: | | 0.002%
/model.8/m.0/m/m.0/cv2/conv/Conv: | | 0.002%
/model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | | 0.002%
/model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | | 0.002%
/model.10/m/m.0/attn/MatMul_1: | | 0.002%
/model.22/m.0/m/m.1/cv2/conv/Conv: | | 0.001%
/model.6/m.0/m/m.0/cv2/conv/Conv: | | 0.001%
/model.8/m.0/m/m.0/cv1/conv/Conv: | | 0.001%
/model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | | 0.001%
/model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | | 0.001%
/model.2/cv2/conv/Conv: | | 0.001%
/model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | | 0.001%
/model.6/m.0/cv1/conv/Conv: | | 0.001%
/model.23/cv3.2/cv3.2.2/Conv: | | 0.001%
/model.20/conv/Conv: | | 0.001%
/model.23/cv3.1/cv3.1.2/Conv: | | 0.001%
/model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | | 0.001%
/model.6/m.0/cv2/conv/Conv: | | 0.001%
/model.23/cv3.0/cv3.0.2/Conv: | | 0.000%
/model.10/m/m.0/attn/MatMul: | | 0.000%
/model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | | 0.000%
/model.8/m.0/cv2/conv/Conv: | | 0.000%
/model.22/m.0/cv2/conv/Conv: | | 0.000%
/model.3/conv/Conv: | | 0.000%
/model.4/cv2/conv/Conv: | | 0.000%
Quantization error analysis
After using 16-bits quantization on layers with higher layer-wise error and employing horizontal layer split pass, the quantized model’s mAP50:95 on COCO val2017 improves to 33.4% with the same inputs. Additionally, a noticeable decrease in cumulative error of output layers can be observed.
The graphwise error for the output layers of the model, /model.23/cv3.2/cv3.2.2/Conv, /model.23/cv2.2/cv2.2.2/Conv, /model.23/cv3.1/cv3.1.2/Conv, /model.23/cv2.1/cv2.1.2/Conv, /model.23/cv3.0/cv3.0.2/Conv and /model.23/cv2.0/cv2.0.2/Conv, are 0.784%, 8.208%, 0.360%, 4.214%, 0.189% and 4.308% respectively.
Quantization-Aware Training
To further improve the accuracy of the quantized model, we adopt the quantization-aware training(QAT) strategy. Here, QAT is performed based on 8-bit quantization.
Quantization settings
Quantization results
Layer | NOISE:SIGNAL POWER RATIO
/model.10/m/m.0/ffn/ffn.1/conv/Conv: | ████████████████████ | 23.754%
/model.10/m/m.0/attn/proj/conv/Conv: | ██████████████ | 16.118%
/model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | █████████ | 10.878%
/model.8/m.0/cv2/conv/Conv: | █████████ | 10.527%
/model.22/m.0/cv3/conv/Conv: | █████████ | 10.298%
/model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | █████████ | 10.188%
/model.10/m/m.0/attn/pe/conv/Conv: | ████████ | 10.093%
/model.22/m.0/m/m.1/cv2/conv/Conv: | ████████ | 9.891%
/model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | ████████ | 9.839%
/model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | ████████ | 9.827%
/model.23/cv2.2/cv2.2.0/conv/Conv: | ████████ | 9.658%
/model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | ████████ | 9.168%
/model.22/m.0/cv2/conv/Conv: | ███████ | 8.604%
/model.10/m/m.0/attn/MatMul_1: | ███████ | 8.596%
/model.10/m/m.0/attn/qkv/conv/Conv: | ███████ | 8.541%
/model.23/cv2.2/cv2.2.1/conv/Conv: | ███████ | 8.528%
/model.22/cv2/conv/Conv: | ███████ | 8.442%
/model.23/cv2.1/cv2.1.1/conv/Conv: | ███████ | 8.306%
/model.23/cv2.0/cv2.0.1/conv/Conv: | ███████ | 8.015%
/model.10/cv1/conv/Conv: | ███████ | 7.998%
/model.22/cv1/conv/Conv: | ██████ | 7.307%
/model.8/cv1/conv/Conv: | ██████ | 7.265%
/model.23/cv2.1/cv2.1.0/conv/Conv: | ██████ | 6.989%
/model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | ██████ | 6.716%
/model.6/m.0/cv2/conv/Conv: | █████ | 6.595%
/model.2/cv2/conv/Conv: | █████ | 6.131%
/model.22/m.0/m/m.0/cv2/conv/Conv: | █████ | 6.078%
/model.10/m/m.0/attn/MatMul: | █████ | 6.055%
/model.19/m.0/cv2/conv/Conv: | █████ | 5.999%
/model.8/m.0/cv1/conv/Conv: | █████ | 5.919%
/model.13/m.0/cv2/conv/Conv: | █████ | 5.863%
/model.20/conv/Conv: | █████ | 5.638%
/model.8/cv2/conv/Conv: | █████ | 5.616%
/model.10/cv2/conv/Conv: | █████ | 5.464%
/model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | █████ | 5.443%
/model.2/m.0/cv2/conv/Conv: | ████ | 5.426%
/model.8/m.0/m/m.0/cv1/conv/Conv: | ████ | 5.390%
/model.13/cv2/conv/Conv: | ████ | 5.256%
/model.19/cv2/conv/Conv: | ████ | 5.231%
/model.13/cv1/conv/Conv: | ████ | 5.131%
/model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | ████ | 5.122%
/model.6/cv1/conv/Conv: | ████ | 5.049%
/model.6/cv2/conv/Conv: | ████ | 4.788%
/model.8/m.0/m/m.0/cv2/conv/Conv: | ████ | 4.706%
/model.19/cv1/conv/Conv: | ████ | 4.586%
/model.7/conv/Conv: | ████ | 4.586%
/model.8/m.0/m/m.1/cv1/conv/Conv: | ████ | 4.541%
/model.8/m.0/cv3/conv/Conv: | ████ | 4.529%
/model.3/conv/Conv: | ████ | 4.361%
/model.13/m.0/cv1/conv/Conv: | ████ | 4.359%
/model.22/m.0/m/m.1/cv1/conv/Conv: | ████ | 4.328%
/model.6/m.0/cv3/conv/Conv: | ███ | 4.156%
/model.22/m.0/m/m.0/cv1/conv/Conv: | ███ | 4.083%
/model.23/cv2.0/cv2.0.0/conv/Conv: | ███ | 3.998%
/model.19/m.0/cv1/conv/Conv: | ███ | 3.974%
/model.23/cv2.2/cv2.2.2/Conv: | ███ | 3.817%
/model.16/m.0/cv1/conv/Conv: | ███ | 3.797%
/model.16/m.0/cv2/conv/Conv: | ███ | 3.654%
/model.4/cv1/conv/Conv: | ███ | 3.544%
/model.4/cv2/conv/Conv: | ███ | 3.488%
/model.22/m.0/cv1/conv/Conv: | ███ | 3.423%
/model.8/m.0/m/m.1/cv2/conv/Conv: | ███ | 3.382%
/model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | ███ | 3.299%
/model.17/conv/Conv: | ███ | 3.296%
/model.6/m.0/m/m.0/cv1/conv/Conv: | ███ | 3.267%
/model.5/conv/Conv: | ███ | 3.147%
/model.23/cv2.1/cv2.1.2/Conv: | ███ | 3.102%
/model.16/cv2/conv/Conv: | ███ | 3.091%
/model.6/m.0/m/m.0/cv2/conv/Conv: | ███ | 3.080%
/model.23/cv2.0/cv2.0.2/Conv: | ██ | 3.056%
/model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | ██ | 2.989%
/model.2/cv1/conv/Conv: | ██ | 2.874%
/model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | ██ | 2.843%
/model.6/m.0/cv1/conv/Conv: | ██ | 2.819%
/model.9/cv2/conv/Conv: | ██ | 2.662%
/model.6/m.0/m/m.1/cv1/conv/Conv: | ██ | 2.633%
/model.10/m/m.0/ffn/ffn.0/conv/Conv: | ██ | 2.581%
/model.4/m.0/cv1/conv/Conv: | ██ | 2.545%
/model.16/cv1/conv/Conv: | ██ | 2.171%
/model.4/m.0/cv2/conv/Conv: | ██ | 1.942%
/model.6/m.0/m/m.1/cv2/conv/Conv: | ██ | 1.925%
/model.2/m.0/cv1/conv/Conv: | █ | 1.721%
/model.9/cv1/conv/Conv: | █ | 1.140%
/model.1/conv/Conv: | █ | 1.117%
/model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | █ | 0.831%
/model.23/cv3.2/cv3.2.2/Conv: | | 0.443%
/model.23/cv3.1/cv3.1.2/Conv: | | 0.247%
/model.0/conv/Conv: | | 0.150%
/model.23/cv3.0/cv3.0.2/Conv: | | 0.119%
Analysing Layerwise quantization error:: 100%|██████████| 89/89 [04:44<00:00, 3.20s/it]
Layer | NOISE:SIGNAL POWER RATIO
/model.2/cv2/conv/Conv: | ████████████████████ | 1.462%
/model.3/conv/Conv: | ██████████ | 0.764%
/model.4/cv2/conv/Conv: | ██████████ | 0.763%
/model.10/cv2/conv/Conv: | ███████ | 0.535%
/model.9/cv2/conv/Conv: | ██████ | 0.439%
/model.2/cv1/conv/Conv: | █████ | 0.395%
/model.4/cv1/conv/Conv: | █████ | 0.361%
/model.1/conv/Conv: | █████ | 0.347%
/model.2/m.0/cv1/conv/Conv: | ███ | 0.192%
/model.4/m.0/cv2/conv/Conv: | ███ | 0.184%
/model.22/cv1/conv/Conv: | ██ | 0.179%
/model.5/conv/Conv: | ██ | 0.161%
/model.16/cv1/conv/Conv: | ██ | 0.154%
/model.10/cv1/conv/Conv: | ██ | 0.145%
/model.16/m.0/cv2/conv/Conv: | ██ | 0.142%
/model.16/m.0/cv1/conv/Conv: | ██ | 0.113%
/model.4/m.0/cv1/conv/Conv: | █ | 0.107%
/model.0/conv/Conv: | █ | 0.100%
/model.10/m/m.0/attn/pe/conv/Conv: | █ | 0.095%
/model.6/cv1/conv/Conv: | █ | 0.082%
/model.23/cv2.2/cv2.2.2/Conv: | █ | 0.082%
/model.16/cv2/conv/Conv: | █ | 0.076%
/model.6/cv2/conv/Conv: | █ | 0.066%
/model.22/m.0/cv1/conv/Conv: | █ | 0.060%
/model.13/cv2/conv/Conv: | █ | 0.056%
/model.19/cv2/conv/Conv: | █ | 0.041%
/model.10/m/m.0/attn/qkv/conv/Conv: | | 0.034%
/model.7/conv/Conv: | | 0.033%
/model.13/cv1/conv/Conv: | | 0.033%
/model.23/cv2.2/cv2.2.0/conv/Conv: | | 0.032%
/model.10/m/m.0/ffn/ffn.0/conv/Conv: | | 0.032%
/model.23/cv2.0/cv2.0.0/conv/Conv: | | 0.029%
/model.13/m.0/cv1/conv/Conv: | | 0.029%
/model.2/m.0/cv2/conv/Conv: | | 0.026%
/model.19/cv1/conv/Conv: | | 0.025%
/model.6/m.0/cv3/conv/Conv: | | 0.024%
/model.19/m.0/cv2/conv/Conv: | | 0.024%
/model.17/conv/Conv: | | 0.023%
/model.23/cv2.0/cv2.0.2/Conv: | | 0.021%
/model.19/m.0/cv1/conv/Conv: | | 0.019%
/model.23/cv3.2/cv3.2.2/Conv: | | 0.019%
/model.9/cv1/conv/Conv: | | 0.017%
/model.23/cv2.1/cv2.1.0/conv/Conv: | | 0.015%
/model.8/cv1/conv/Conv: | | 0.014%
/model.22/m.0/cv3/conv/Conv: | | 0.014%
/model.13/m.0/cv2/conv/Conv: | | 0.014%
/model.8/m.0/cv3/conv/Conv: | | 0.012%
/model.23/cv2.2/cv2.2.1/conv/Conv: | | 0.011%
/model.23/cv2.1/cv2.1.2/Conv: | | 0.011%
/model.22/m.0/m/m.1/cv1/conv/Conv: | | 0.010%
/model.22/m.0/m/m.0/cv1/conv/Conv: | | 0.009%
/model.20/conv/Conv: | | 0.009%
/model.8/cv2/conv/Conv: | | 0.009%
/model.6/m.0/m/m.1/cv1/conv/Conv: | | 0.008%
/model.10/m/m.0/ffn/ffn.1/conv/Conv: | | 0.008%
/model.23/cv3.1/cv3.1.0/cv3.1.0.1/conv/Conv: | | 0.008%
/model.23/cv2.1/cv2.1.1/conv/Conv: | | 0.008%
/model.23/cv2.0/cv2.0.1/conv/Conv: | | 0.007%
/model.23/cv3.0/cv3.0.0/cv3.0.0.1/conv/Conv: | | 0.007%
/model.10/m/m.0/attn/proj/conv/Conv: | | 0.007%
/model.8/m.0/m/m.1/cv1/conv/Conv: | | 0.007%
/model.8/m.0/cv1/conv/Conv: | | 0.007%
/model.23/cv3.1/cv3.1.1/cv3.1.1.0/conv/Conv: | | 0.006%
/model.23/cv3.2/cv3.2.0/cv3.2.0.1/conv/Conv: | | 0.005%
/model.22/cv2/conv/Conv: | | 0.005%
/model.6/m.0/m/m.0/cv1/conv/Conv: | | 0.004%
/model.22/m.0/m/m.0/cv2/conv/Conv: | | 0.004%
/model.23/cv3.1/cv3.1.1/cv3.1.1.1/conv/Conv: | | 0.003%
/model.6/m.0/cv1/conv/Conv: | | 0.003%
/model.8/m.0/m/m.0/cv1/conv/Conv: | | 0.003%
/model.8/m.0/m/m.1/cv2/conv/Conv: | | 0.003%
/model.8/m.0/m/m.0/cv2/conv/Conv: | | 0.003%
/model.6/m.0/m/m.1/cv2/conv/Conv: | | 0.003%
/model.23/cv3.2/cv3.2.1/cv3.2.1.0/conv/Conv: | | 0.002%
/model.23/cv3.1/cv3.1.2/Conv: | | 0.002%
/model.23/cv3.0/cv3.0.0/cv3.0.0.0/conv/Conv: | | 0.002%
/model.23/cv3.2/cv3.2.1/cv3.2.1.1/conv/Conv: | | 0.002%
/model.22/m.0/m/m.1/cv2/conv/Conv: | | 0.002%
/model.6/m.0/m/m.0/cv2/conv/Conv: | | 0.002%
/model.10/m/m.0/attn/MatMul_1: | | 0.002%
/model.23/cv3.0/cv3.0.2/Conv: | | 0.001%
/model.23/cv3.0/cv3.0.1/cv3.0.1.0/conv/Conv: | | 0.001%
/model.23/cv3.0/cv3.0.1/cv3.0.1.1/conv/Conv: | | 0.001%
/model.23/cv3.2/cv3.2.0/cv3.2.0.0/conv/Conv: | | 0.001%
/model.6/m.0/cv2/conv/Conv: | | 0.000%
/model.23/cv3.1/cv3.1.0/cv3.1.0.0/conv/Conv: | | 0.000%
/model.10/m/m.0/attn/MatMul: | | 0.000%
/model.8/m.0/cv2/conv/Conv: | | 0.000%
/model.22/m.0/cv2/conv/Conv: | | 0.000%
Quantization error analysis
After applying QAT to 8-bit quantization, the quantized model’s mAP50:95 on COCO val2017 improves to 35.5% with the same inputs, while cumulative errors of out layers are significantly reduced. Compared to the other two quantization methods, the 8-bit QAT quantized model achieves the highest quantization accuracy with the lowest inference latency.
The graphwise error for the output layers of the model, /model.23/cv3.2/cv3.2.2/Conv, /model.23/cv2.2/cv2.2.2/Conv, /model.23/cv3.1/cv3.1.2/Conv, /model.23/cv2.1/cv2.1.2/Conv, /model.23/cv3.0/cv3.0.2/Conv and /model.23/cv2.0/cv2.0.2/Conv, are 0.443%, 3.817%, 0.247%, 3.102%, 0.119% and 3.056% respectively.
Model deployment
Object detection base class
Pre-process
ImagePreprocessor
class contains the common pre-precoess pipeline, color conversion
, crop
, resize
, normalization
, quantize
。