Quantization Toolkit API
Calibrator Class
Initialization
Calibrator(quantization_bit, granularity='per-tensor', calib_method='minmax')
Arguments
quantization_bit (string):
‘int8’ for full int8 quantization.
‘int16’ for full int16 quantization.
granularity (string):
If granularity = ‘per-tensor’(default), there will be one exponent per entire tensor.
If granularity = ‘per-channel’, there will be one exponent for each channel of a convolution layer.
calib_method (string):
If calib_method = ‘minmax’(default), the threshold is derived from the minimum and maximum values of the layer outputs from calibration dataset.
If calib_method = ‘entropy’, the threshold is derived from Kullback-Leibler divergence (KL divergence).
check_model method
Calibrator.check_model(model_proto)
Checks the compatibility of your model.
Argument
model_proto (ModelProto): An FP32 ONNX model.
Return
-1: The model is incompatible.
set_method method
Calibrator.set_method(granularity, calib_method)
Configures quantization.
Arguments
granularity (string):
If granularity = ‘per-tensor’, there will be one exponent per entire tensor.
If granularity = ‘per-channel’, there will be one exponent for each channel of a convolution layer.
calib_method (string):
If calib_method = ‘minmax’, the threshold is derived from the minimum and maximum values of the layer outputs from calibration dataset.
If calib_method = ‘entropy’, the threshold is derived from Kullback-Leibler divergence (KL divergence).
set_providers method
Calibrator.set_providers(providers)
Configures the execution provider of ONNX Runtime.
Argument
providers (list of strings): An execution provider in the list, for example ‘CPUExecutionProvider’, and ‘CUDAExecutionProvider’.
generate_quantization_table method
Calibrator.generate_quantization_table(model_proto, calib_dataset, pickle_file_path)
Generates the quantization table.
Arguments
model_proto (ModelProto): An FP32 ONNX model.
calib_dataset (ndarray): The calibration dataset used to compute the threshold. The larger the dataset, the longer time it takes to generate the quantization table.
pickle_file_path (string): Path of the pickle file that stores the dictionary of quantization parameters.
export_coefficient_to_cpp method
Calibrator.export_coefficient_to_cpp(model_proto, pickle_file_path, target_chip, output_path, file_name, print_model_info=False)
Exports the quantized model coefficient such as weight to deploy on ESP SoCs.
Arguments
model_proto (ModelProto): An FP32 ONNX model.
pickle_file_path (string): Path of the pickle file that stores the dictionary of quantization parameters.
target_chip (string): Currently support ‘esp32’, ‘esp32s2’, ‘esp32c3’ and ‘esp32s3’.
output_path (string): Path of output files.
file_name (string): Name of output files.
print_model_info_(bool)_:
False (default): No log will be printed.
True: Information of the model will be printed.
Evaluator Class
Initialization
Evaluator(quantization_bit, granularity, target_chip)
Arguments
quantization_bit (string):
‘int8’ for full int8 quantization.
‘int16’ for full int16 quantization.
granularity (string):
If granularity = ‘per-tensor’, there will be one exponent per entire tensor.
If granularity = ‘per-channel’, there will be one exponent for each channel of a convolution layer.
target_chip (string): ‘esp32s3’ by default.
check_model method
Evaluator.check_model(model_proto)
Checks the compatibility of your model.
Argument
model_proto (ModelProto): An FP32 ONNX model.
Return
-1: The model is incompatible.
set_target_chip method
Evaluator.set_target_chip(target_chip)
Configures the chip environment to simulate.
Argument
target_chip (string): For now only ‘esp32s3’ is supported.
set_providers method
Evaluator.set_providers(providers)
Configures the execution provider of ONNX Runtime.
Argument
providers (list of strings): An execution provider in the list, for example ‘CPUExecutionProvider’, and ‘CUDAExecutionProvider’.
generate_quantized_model method
Evaluator.generate_quantized_model(model_proto, pickle_file_path)
Generates the quantized model.
Arguments
model_proto (ModelProto): An FP32 ONNX model.
pickle_file_path (string): Path of the pickle file that stores all quantization parameters for the FP32 ONXX model. This pickle file must contain a dictionary of quantization parameters for all input and output nodes in the model graph.
evaluate_quantized_model method
Evaluator.evaluate_quantized_model(batch_fp_input, to_float=False)
Obtains outputs of the quantized model.
Arguments
batch_fp_input (ndarray): Batch of floating-point inputs.
to_float (bool): - False (default): Outputs will be returned directly. - True: Outputs will be converted to floating-point values.
Returns
A tuple of outputs and output_names:
outputs (list of ndarray): Outputs of the quantized model.
output_names (list of strings): Names of outputs.