How to load & test & profile model

In this tutorial, we will show you how to load, test, profile an espdl model. example

Preparation
Load model from rodata
Load model from partition
Load model from sdcard
Test whether on-board model inference is correct
Profile model memory usage
Profile model inference latency

If you want to put the .espdl model file into the .rodata section of the FLASH chip, you need to add the following code in CMakeLists.txt. The first few lines should be placed before idf_component_register() and the last line should be placed after idf_component_register().

idf_build_get_property(component_targets __COMPONENT_TARGETS)
if ("___idf_espressif__esp-dl" IN_LIST component_targets)
idf_component_get_property(espdl_dir espressif__esp-dl COMPONENT_DIR)
elseif("___idf_esp-dl" IN_LIST component_targets)
idf_component_get_property(espdl_dir esp-dl COMPONENT_DIR)
endif()
set(cmake_dir ${espdl_dir}/fbs_loader/cmake)
include(${cmake_dir}/utilities.cmake)
set(embed_files your_model_path/model_name.espdl) idf_component_register(...)

target_add_aligned_binary_data(${COMPONENT_LIB} ${embed_files} BINARY)

Load the model in the program

// "_binary_model_espdl_start" is composed of three parts: the prefix "binary", the filename "model_espdl", and the suffix "_start".
extern const uint8_t model_espdl[] asm("_binary_model_espdl_start");

dl::Model *model = new dl::Model((const char *)model_espdl, fbs::MODEL_LOCATION_IN_FLASH_RODATA);

// Keep parameter in FLASH, saves PSRAM/internal RAM, lower performance.
// dl::Model *model = new dl::Model((const char *)model_espdl, fbs::MODEL_LOCATION_IN_FLASH_RODATA, 0, dl::MEMORY_MANAGER_GREEDY, nullptr, false);

Note

When using Load model from rodata, since the .rodata section belongs to the app partition, the model file will be flashed every time the code is modified. If the model file is large, you may need to adjust the size of the app partition. Using Load model from partition or Load model from sdcard can avoid repeatedly flashing the model, which helps to reduce the flashing time.
When using Load model from rodata or Load model from partition, turning off the param_copy option in the Model constructor can avoid copying the model weights in FLASH to PSRAM or internal RAM. This can reduce the use of PSRAM or internal RAM. However, since the frequency of PSRAM or internal RAM is higher than FLASH, the inference performance of the model will decrease.

Load model from `partition`

Add model information in partition.csv

About partition.csv, please refer to the partition table documentation.

# Name, Type, SubType, Offset, Size, Flags
factory, app, factory, 0x010000, 4000K,
model, data, spiffs, , 4000K,

The Name field of the model can be any meaningful name, but cannot exceed 16 bytes, including a null byte (the content after that will be truncated). The SubType field must be spiffs. The Offset can be left blank after other partitions and will be automatically calculated. Size must be larger than the size of the model file.

Add model flashing information in CMakeLists.txt

idf_component_register(...)
set(image_file your_model_path)
esptool_py_flash_to_partition(flash "model" "${image_file}")

The second parameter in esptool_py_flash_to_partition must be consistent with the Name field in partition.csv.

Load the model in the program

dl::Model *model = new dl::Model("model", fbs::MODEL_LOCATION_IN_FLASH_PARTITION);

// Keep parameter in flash, saves PSRAM/internal RAM, lower performance.
// dl::Model *model = new dl::Model("model", fbs::MODEL_LOCATION_IN_FLASH_PARTITION, 0, dl::MEMORY_MANAGER_GREEDY,
// nullptr, false);

The first parameter of the constructor must be consistent with the Name field in partition.csv.

Note

Use idf.py app-flash instead of idf.py flash to flash only the app partition without flashing the model partition, this can reduce the flashing time.

Load model from `sdcard`

Check if sdcard is in the correct format

Back up the data first, then try to mount it on the board. If the sdcard is not in the correct format, it will be automatically formatted to the correct format.

If using BSP(Board Support Package)

Enable CONFIG_BSP_SD_FORMAT_ON_MOUNT_FAIL option in menuconfig.
```
ESP_ERROR_CHECK(bsp_sdcard_mount());
```

If not using BSP(Board Support Package)

Set format_if_mount_failed in esp_vfs_fat_sdmmc_mount_config_t structure to true.

esp_vfs_fat_sdmmc_mount_config_t mount_config = {
      .format_if_mount_failed = true,
      .max_files = 5,
      .allocation_unit_size = 16 * 1024
};
// Mount sdcard.

Copy model to sdcard

Copy .espdl model to sdcard.
Load the model in the program

If using BSP(Board Support Package)

ESP_ERROR_CHECK(bsp_sdcard_mount());
const char *model_path = "/your_sdcard_mount_point/your_model_path/model_name.espdl";
Model *model = new Model(model_path, fbs::MODEL_LOCATION_IN_SDCARD);

If not using BSP(Board Support Package)

// Mount sdcard.
const char *model_path = "/your_sdcard_mount_point/your_model_path/model_name.espdl";
Model *model = new Model(model_path, fbs::MODEL_LOCATION_IN_SDCARD);

Note

When using load model from sdcard, the model loading process will take longer because the model data needs to be copied from sdcard to PSRAM or internal RAM. This method is useful if your FLASH is tight.

Test whether on-board model inference is correct 

In order to test whether on-board model inference is correct, the .espdl model needs to add test input/output when exporting. When actually deploying, you can export a version without test input and output to reduce the size of the model file.

ESP_ERROR_CHECK(model->test());

Profile model memory usage 

model->profile_memory();

Name	Explanation
`fbs_model` `parameter`	Flatbuffers model, contains a sub-item `parameter`. In addition to the model parameters, the flatbuffers model also contains test input/output, test input and output, model parameter/variable shape, model structure etc.
`parameter_copy`	Copied Model parameter, when the flatbuffers model locates in FLASH, parameters are copied to PSRAM or internal RAM by default to improve inference performance.
`variable`	Memory allocated by the memory manager. Model input/output and intermediate calculation results will use this space.
`others`	Space required for class member variables, extra part for alignment during `heap_caps_aligned_alloc` / `heap_caps_aligned_calloc` (very small).

Profile model inference latency 

model->profile_module();

By default, the model modules are printed in ONNX topological sort. If you want to sort by the latency of each module, you can set the input parameter of profile_module to True.

model->profile_module(true);

How to load & test & profile model

Preparation 

Load model from `rodata`

Load model from `partition`

Load model from `sdcard`

Test whether on-board model inference is correct 

Profile model memory usage 

Profile model inference latency 

How to load & test & profile model

Preparation

Load model from rodata

Load model from partition

Load model from sdcard

Test whether on-board model inference is correct

Profile model memory usage

Profile model inference latency

Preparation 

Load model from `rodata`

Load model from `partition`

Load model from `sdcard`

Test whether on-board model inference is correct 

Profile model memory usage 

Profile model inference latency 