Model API Reference
This section covers model loading and static memory planning, making it convenient for users to directly load and run ESPDL models.
Header File
Macros
-
DL_LOG_INFER_LATENCY_INIT_WITH_SIZE(size)
-
DL_LOG_INFER_LATENCY_INIT()
-
DL_LOG_INFER_LATENCY_START()
-
DL_LOG_INFER_LATENCY_END()
-
DL_LOG_INFER_LATENCY_PRINT(prefix, key)
-
DL_LOG_INFER_LATENCY_END_PRINT(prefix, key)
-
DL_LOG_INFER_LATENCY_ARRAY_INIT_WITH_SIZE(n, size)
-
DL_LOG_INFER_LATENCY_ARRAY_INIT(n)
-
DL_LOG_INFER_LATENCY_ARRAY_START(i)
-
DL_LOG_INFER_LATENCY_ARRAY_END(i)
-
DL_LOG_INFER_LATENCY_ARRAY_PRINT(i, prefix, key)
-
DL_LOG_INFER_LATENCY_ARRAY_END_PRINT(i, prefix, key)
Classes
-
class Model
Neural Network Model.
Public Functions
-
Model(const char *rodata_address_or_partition_label_or_path, fbs::model_location_type_t location = fbs::MODEL_LOCATION_IN_FLASH_RODATA, int max_internal_size = 0, memory_manager_t mm_type = MEMORY_MANAGER_GREEDY, const uint8_t *key = nullptr, bool param_copy = true)
Create the Model object by rodata address or partition label.
- Parameters
rodata_address_or_partition_label_or_path – The address of model data while location is MODEL_LOCATION_IN_FLASH_RODATA. The label of partition while location is MODEL_LOCATION_IN_FLASH_PARTITION. The path of model while location is MODEL_LOCATION_IN_SDCARD.
location – The model location.
max_internal_size – In bytes. Limit the max internal size usage. Only take effect when there’s a PSRAM, and you want to alloc memory on internal RAM first.
mm_type – Type of memory manager
key – The key of encrypted model.
param_copy – Set to false to avoid copy model parameters from FLASH to PSRAM. Only set this param to false when your PSRAM resource is very tight. This saves PSRAM and sacrifices the performance of model inference because the frequency of PSRAM is higher than FLASH. Only takes effect when MODEL_LOCATION_IN_FLASH_RODATA(CONFIG_SPIRAM_RODATA not set) or MODEL_LOCATION_IN_FLASH_PARTITION.
-
Model(const char *rodata_address_or_partition_label_or_path, int model_index, fbs::model_location_type_t location = fbs::MODEL_LOCATION_IN_FLASH_RODATA, int max_internal_size = 0, memory_manager_t mm_type = MEMORY_MANAGER_GREEDY, const uint8_t *key = nullptr, bool param_copy = true)
Create the Model object by rodata address or partition label.
- Parameters
rodata_address_or_partition_label_or_path – The address of model data while location is MODEL_LOCATION_IN_FLASH_RODATA. The label of partition while location is MODEL_LOCATION_IN_FLASH_PARTITION. The path of model while location is MODEL_LOCATION_IN_SDCARD.
model_index – The model index of packed models.
location – The model location.
max_internal_size – In bytes. Limit the max internal size usage. Only take effect when there’s a PSRAM, and you want to alloc memory on internal RAM first.
mm_type – Type of memory manager
key – The key of encrypted model.
param_copy – Set to false to avoid copy model parameters from FLASH to PSRAM. Only set this param to false when your PSRAM resource is very tight. This saves PSRAM and sacrifices the performance of model inference because the frequency of PSRAM is higher than FLASH. Only takes effect when MODEL_LOCATION_IN_FLASH_RODATA(CONFIG_SPIRAM_RODATA not set) or MODEL_LOCATION_IN_FLASH_PARTITION.
-
Model(const char *rodata_address_or_partition_label_or_path, const char *model_name, fbs::model_location_type_t location = fbs::MODEL_LOCATION_IN_FLASH_RODATA, int max_internal_size = 0, memory_manager_t mm_type = MEMORY_MANAGER_GREEDY, const uint8_t *key = nullptr, bool param_copy = true)
Create the Model object by rodata address or partition label.
- Parameters
rodata_address_or_partition_label_or_path – The address of model data while location is MODEL_LOCATION_IN_FLASH_RODATA. The label of partition while location is MODEL_LOCATION_IN_FLASH_PARTITION. The path of model while location is MODEL_LOCATION_IN_SDCARD.
model_name – The model name of packed models.
location – The model location.
max_internal_size – In bytes. Limit the max internal size usage. Only take effect when there’s a PSRAM, and you want to alloc memory on internal RAM first.
mm_type – Type of memory manager
key – The key of encrypted model.
param_copy – Set to false to avoid copy model parameters from FLASH to PSRAM. Only set this param to false when your PSRAM resource is very tight. This saves PSRAM and sacrifices the performance of model inference because the frequency of PSRAM is higher than FLASH. Only takes effect when MODEL_LOCATION_IN_FLASH_RODATA(CONFIG_SPIRAM_RODATA not set) or MODEL_LOCATION_IN_FLASH_PARTITION.
-
Model(fbs::FbsModel *fbs_model, int internal_size = 0, memory_manager_t mm_type = MEMORY_MANAGER_GREEDY)
Create the Model object by fbs_model.
- Parameters
fbs_model – The fbs model.
internal_size – Internal ram size, in bytes
mm_type – Type of memory manager
-
virtual esp_err_t load(const char *rodata_address_or_partition_label_or_path, fbs::model_location_type_t location = fbs::MODEL_LOCATION_IN_FLASH_RODATA, const uint8_t *key = nullptr, bool param_copy = true)
Load model graph and parameters from FLASH or sdcard.
- Parameters
rodata_address_or_partition_label_or_path – The address of model data while location is MODEL_LOCATION_IN_FLASH_RODATA. The label of partition while location is MODEL_LOCATION_IN_FLASH_PARTITION. The path of model while location is MODEL_LOCATION_IN_SDCARD.
location – The model location.
key – The key of encrypted model.
param_copy – Set to false to avoid copy model parameters from FLASH to PSRAM. Only set this param to false when your PSRAM resource is very tight. This saves PSRAM and sacrifices the performance of model inference because the frequency of PSRAM is higher than FLASH. Only takes effect when MODEL_LOCATION_IN_FLASH_RODATA(CONFIG_SPIRAM_RODATA not set) or MODEL_LOCATION_IN_FLASH_PARTITION.
- Returns
ESP_OK Success
ESP_FAIL Failed
-
virtual esp_err_t load(const char *rodata_address_or_partition_label_or_path, fbs::model_location_type_t location = fbs::MODEL_LOCATION_IN_FLASH_RODATA, int model_index = 0, const uint8_t *key = nullptr, bool param_copy = true)
Load model graph and parameters from FLASH or sdcard.
- Parameters
rodata_address_or_partition_label_or_path – The address of model data while location is MODEL_LOCATION_IN_FLASH_RODATA. The label of partition while location is MODEL_LOCATION_IN_FLASH_PARTITION. The path of model while location is MODEL_LOCATION_IN_SDCARD.
location – The model location.
model_index – The model index of packed models.
key – The key of encrypted model.
param_copy – Set to false to avoid copy model parameters from FLASH to PSRAM. Only set this param to false when your PSRAM resource is very tight. This saves PSRAM and sacrifices the performance of model inference because the frequency of PSRAM is higher than FLASH. Only takes effect when MODEL_LOCATION_IN_FLASH_RODATA(CONFIG_SPIRAM_RODATA not set) or MODEL_LOCATION_IN_FLASH_PARTITION.
- Returns
ESP_OK Success
ESP_FAIL Failed
-
virtual esp_err_t load(const char *rodata_address_or_partition_label_or_path, fbs::model_location_type_t location = fbs::MODEL_LOCATION_IN_FLASH_RODATA, const char *model_name = nullptr, const uint8_t *key = nullptr, bool param_copy = true)
Load model graph and parameters from FLASH or sdcard.
- Parameters
rodata_address_or_partition_label_or_path – The address of model data while location is MODEL_LOCATION_IN_FLASH_RODATA. The label of partition while location is MODEL_LOCATION_IN_FLASH_PARTITION. The path of model while location is MODEL_LOCATION_IN_SDCARD.
location – The model location.
model_name – The model name of packed models.
key – The key of encrypted model.
param_copy – Set to false to avoid copy model parameters from FLASH to PSRAM. Only set this param to false when your PSRAM resource is very tight. This saves PSRAM and sacrifices the performance of model inference because the frequency of PSRAM is higher than FLASH. Only takes effect when MODEL_LOCATION_IN_FLASH_RODATA(CONFIG_SPIRAM_RODATA not set) or MODEL_LOCATION_IN_FLASH_PARTITION.
- Returns
ESP_OK Success
ESP_FAIL Failed
-
virtual esp_err_t load(fbs::FbsModel *fbs_model)
Load model graph and parameters from Flatbuffers model.
- Parameters
fbs_model – The FlatBuffers model
- Returns
ESP_OK Success
ESP_FAIL Failed
-
virtual void build(size_t max_internal_size, memory_manager_t mm_type = MEMORY_MANAGER_GREEDY, bool preload = false)
Allocate memory for the model.
- Parameters
max_internal_size – In bytes. Limit the max internal size usage. Only take effect when there’s a PSRAM, and you want to alloc memory on internal RAM first.
mm_type – Type of memory manager
preload – Whether to preload the model’s parameters to internal ram (not implemented yet)
-
virtual void run(runtime_mode_t mode = RUNTIME_MODE_SINGLE_CORE)
Run the model module by module.
- Parameters
mode – Runtime mode.
-
virtual void run(TensorBase *input, runtime_mode_t mode = RUNTIME_MODE_SINGLE_CORE)
Run the model module by module.
- Parameters
input – The model input.
mode – Runtime mode.
-
virtual void run(std::map<std::string, TensorBase*> &user_inputs, runtime_mode_t mode = RUNTIME_MODE_SINGLE_CORE, std::map<std::string, TensorBase*> user_outputs = {})
Run the model module by module.
- Parameters
user_inputs – The model inputs.
mode – Runtime mode.
user_outputs – It’s for debug to pecify the output of the intermediate layer; Under normal use, there is no need to pass a value to this parameter. If no parameter is passed, the default is the graphical output, which can be obtained through Model::get_outputs().
-
void minimize()
Minimize the model.
-
esp_err_t test()
Test whether the model inference result is correct. The model should contain test_inputs and test_outputs. Enable export_test_values option in esp-ppq to use this api.
- Returns
esp_err_t
-
std::map<std::string, mem_info_t> get_memory_info()
Get memory info.
- Returns
Memory usage statistics on internal and PSRAM.
-
std::map<std::string, module_info> get_module_info()
Get module info.
- Returns
return Type and latency of each module.
-
void print_module_info(const std::map<std::string, module_info> &info, bool sort_module_by_latency = false)
Print the module info obtained by get_module_info function.
- Parameters
info –
sort_module_by_latency –
-
void profile_memory()
Print model memory summary.
-
void profile_module(bool sort_module_by_latency = false)
Print module info summary. (Name, Type, Latency)
- Parameters
sort_module_by_latency – True The module is printed in latency decreasing sort. False The module is printed in ONNX topological sort.
-
void profile(bool sort_module_by_latency = false)
Combination of profile_memory & profile_module.
- Parameters
sort_module_by_latency – True The module is printed in latency decreasing sort. False The module is printed in ONNX topological sort.
-
virtual std::map<std::string, TensorBase*> &get_inputs()
Get inputs of model.
- Returns
The map of model input’s name and TensorBase*
-
virtual TensorBase *get_input()
Get the only input of model.
- Returns
TensorBase*
-
virtual TensorBase *get_input(const std::string &name)
Get input of model by name.
- Parameters
name – input name
- Returns
TensorBase*
-
virtual TensorBase *get_intermediate(const std::string &name)
Get intermediate TensorBase of model.
Note
When using memory manager, the content of TensorBase’s data may be overwritten by the outputs of other
- Parameters
name – The name of intermediate Tensor. operators.
- Returns
The intermediate TensorBase*.
-
virtual std::map<std::string, TensorBase*> &get_outputs()
Get outputs of model.
- Returns
The map of model output’s name and TensorBase*
-
virtual TensorBase *get_output()
Get the only output of model.
- Returns
TensorBase*
-
virtual TensorBase *get_output(const std::string &name)
Get output of model by name.
- Parameters
name – output name
- Returns
TensorBase*
-
virtual void print()
Print the model.
-
Model(const char *rodata_address_or_partition_label_or_path, fbs::model_location_type_t location = fbs::MODEL_LOCATION_IN_FLASH_RODATA, int max_internal_size = 0, memory_manager_t mm_type = MEMORY_MANAGER_GREEDY, const uint8_t *key = nullptr, bool param_copy = true)
Header File
Macros
-
CONTEXT_PARAMETER_OFFSET
Offset for parameter tensors
Classes
-
class ModelContext
Model Context class including variable tensors and parameters.
Public Functions
-
inline ModelContext()
Constructor for ModelContext. Initializes the PSRAM and internal root pointers to nullptr.
-
inline ~ModelContext()
Destructor for ModelContext. Clears all resources and tensors.
-
int add_tensor(const std::string name, bool is_paramter = false, TensorBase *tensor = nullptr)
Adds a tensor to the parameter or variable list.
- Parameters
name – The name of the tensor.
is_paramter – Whether the tensor is a parameter (default: false).
tensor – Pointer to the TensorBase object (default: nullptr).
- Returns
int Returns the index of the added tensor.
-
int push_back_tensor(TensorBase *tensor, bool is_paramter = false)
Push back a tensor.
- Parameters
tensor – Pointer to the TensorBase object.
is_paramter – Whether the tensor is a parameter (default: false).
- Returns
int Returns the index of the added tensor.
-
void update_tensor(int index, TensorBase *tensor)
Updates the tensor at the specified index.
- Parameters
index – The index of the tensor to update.
tensor – Pointer to the new TensorBase object.
-
TensorBase *get_tensor(int index)
Gets the tensor by its index.
- Parameters
index – The index of the tensor.
- Returns
TensorBase* Returns the pointer to the TensorBase object, or nullptr if the index is invalid.
-
TensorBase *get_tensor(const std::string &name)
Gets the tensor by its name.
- Parameters
name – The name of the tensor.
- Returns
TensorBase* Returns the pointer to the TensorBase object, or nullptr if the name is not found.
-
int get_tensor_index(const std::string &name)
Gets the tensor index by its name.
- Parameters
name – The name of the tensor.
- Returns
int Returns index if the name is found, else -1
-
int get_variable_index(const std::string &name)
Gets the variable tensor index by its name.
- Parameters
name – The name of the tensor.
- Returns
int Returns index if the name is found and is variable tensor, else -1
-
inline int get_variable_count()
Gets the count of variable tensors.
- Returns
int Returns the number of variable tensors.
-
inline int get_parameter_count()
Gets the count of parameter tensors.
- Returns
int Returns the number of parameter tensors.
-
bool root_alloc(size_t internal_size, size_t psram_size, int alignment = 16)
Allocates memory for PSRAM and internal roots.
- Parameters
internal_size – The size of the internal memory in bytes.
psram_size – The size of the PSRAM memory in bytes.
alignment – The alignment of the memory in bytes.
- Returns
Bool Return true if the allocation is successful, false otherwise.
-
inline void *get_psram_root()
Gets the pointer to the PSRAM root.
- Returns
Void* Returns the pointer to the PSRAM root.
-
inline void *get_internal_root()
Gets the pointer to the internal root.
- Returns
Void* Returns the pointer to the internal root.
-
size_t get_parameter_memory_size(mem_info_t &mem_info, bool copy)
Gets the size of the parameters in bytes.
- Parameters
mem_info – The size of the memory used by the parameters in bytes, filtered by copy option.
copy – Filter the parameters by auto_free.
- Returns
size_t Returns the total size of the parameters memory in bytes.
-
size_t get_variable_memory_size(mem_info_t &mem_info)
Get the variable memory size object.
- Parameters
mem_info – The size of the memory used by the variables in bytes.
- Returns
size_t Returns the total size of the variables memory in bytes.
-
inline void root_free()
Frees the memory allocated for PSRAM and internal roots. This function ensures proper cleanup of allocated memory.
-
inline void minimize()
Minimizes the context by clearing the name-to-index map. This is used to free unnecessary intermediate variables during the inference.
-
inline void clear()
Clears all resources and tensors in the context. This includes clearing variables, parameters, name-to-index map, and freeing memory.
Public Members
-
std::vector<TensorBase*> m_variables
Variable tensors of model, the first one is nullptr
-
std::vector<TensorBase*> m_parameters
Parameters of model, the first one is nullptr
-
inline ModelContext()
Header File
Classes
-
class MemoryManagerBase
Memory manager base class, each model has its own memory manager TODO: share memory manager with different models.
Subclassed by dl::MemoryManagerGreedy
Public Functions
-
inline MemoryManagerBase(int alignment = 16)
Construct a new Memory Manager Base object.
- Parameters
alignment – Memory address alignment
-
inline virtual ~MemoryManagerBase()
Destroy the MemoryManager object. Return resource.
-
virtual bool alloc(fbs::FbsModel *fbs_model, std::vector<dl::module::Module*> &execution_plan, ModelContext *context) = 0
Allocate memory for each tensor, include all input and output tensors.
Public Members
-
int alignment
The root pointer needs to be aligned must be a power of two
-
inline MemoryManagerBase(int alignment = 16)
-
class TensorInfo
Tensor info, include tensor name, shape, dtype, size, time range and call times, which is used to plan model memory.
Public Functions
-
TensorInfo(std::string &name, int time_begin, int time_end, std::vector<int> shape, dtype_t dtype, int exponent, bool is_internal = false)
Construct a new Tensor Info object.
- Parameters
name – Tensor name
time_begin – Tensor lifetime begin
time_end – Tensor lifetime end
shape – Tensor shape
dtype – Tensor dtype
exponent – Tensor exponent
is_internal – Is tensor in internal RAM or not
-
inline ~TensorInfo()
Destroy the Tensor Info object.
-
void set_inplace_leader_tensor(TensorInfo *tensor)
Set the inplace leader tensor object.
- Parameters
tensor – Inplace leader tensor
-
inline void set_inplace_follower_tensor(TensorInfo *tensor)
Set the inplace follower tensor object.
- Parameters
tensor – Inplace follower tensor
-
inline TensorInfo *get_inplace_follower_tensor()
Get the inplace follower tensor object.
- Returns
TensorInfo* Inplace follower tensor
-
void update_time(int new_time)
Update Tensor lifetime.
- Parameters
new_time – new tensor lifetime
-
TensorBase *create_tensor(void *internal_root, void *psram_root)
Create a TensorBase object according to TensorInfo.
- Parameters
internal_root – Internal RAM root pointer
psram_root – PSRAM root pointer
- Returns
TensorBase*
-
inline bool is_inplaced()
Is inplaced or not.
- Returns
true if inplaced else false
-
inline uint32_t get_offset()
Get the tensor offset.
- Returns
uint32_t
-
inline void set_offset(uint32_t offset)
Set the tensor offset.
- Parameters
offset –
-
inline uint32_t get_internal_offset()
Get the internal offset.
- Returns
uint32_t
-
inline bool get_internal_state()
Get the internal state.
- Returns
true if is internal else false
-
inline void set_internal_state(bool is_internal)
Set the internal state.
- Parameters
is_internal –
-
inline void set_internal_offset(uint32_t offset)
Set the internal offset.
- Parameters
offset –
-
inline int get_time_end()
Get the liftetime end.
- Returns
int
-
inline int get_time_begin()
Get the liftetime begin.
- Returns
int
-
inline size_t get_size()
Get the tensor size.
- Returns
size_t
-
inline std::string get_name()
Get the tensor name.
- Returns
std::string
-
inline std::vector<int> get_shape()
Get the tensor shape.
- Returns
std::vector<int>
-
inline void print()
print tensor info
-
TensorInfo(std::string &name, int time_begin, int time_end, std::vector<int> shape, dtype_t dtype, int exponent, bool is_internal = false)
-
class MemoryChunk
Memory chunk, include size, is free, offset, alignment and tensor, which is used to simulate memory allocation.
Public Functions
-
MemoryChunk(size_t size, int is_free, int alignment = 16)
Construct a new Memory Chunk object.
- Parameters
size – Memory chunk size
is_free – Whether free or not
alignment – Memory chunk alignment
-
MemoryChunk(TensorInfo *tensor, int alignment = 16)
Construct a new Memory Chunk object.
- Parameters
tensor – TensorInfo
alignment – Memory chunk alignment
-
inline ~MemoryChunk()
Destroy the Memory Chunk object.
-
MemoryChunk *merge_free_chunk(MemoryChunk *chunk)
Merge continuous free chunk.
- Parameters
chunk –
- Returns
MemoryChunk*
-
MemoryChunk *insert(TensorInfo *tensor)
Insert tensor into free chunk.
- Parameters
tensor –
- Returns
MemoryChunk*
-
MemoryChunk *extend(TensorInfo *tensor)
Extend free chunk and insert tensor.
- Parameters
tensor –
- Returns
MemoryChunk*
-
inline void free()
Free memory chunk, set is_free to true and set tensor to nullptr.
-
size_t get_aligned_size(size_t size)
get aligned size, which is 16/alignemt bytes aligned
- Parameters
size –
- Returns
size_t
-
MemoryChunk(size_t size, int is_free, int alignment = 16)
Header File
Classes
-
class MemoryManagerGreedy : public dl::MemoryManagerBase
Greedy memory manager that allocates memory for tensors in execution order, prioritizing internal RAM allocation first.
Public Functions
-
inline MemoryManagerGreedy(int max_internal_size, int alignment = 16)
Constructs a greedy memory manager with specified constraints.
- Parameters
max_internal_size – Maximum allowed internal RAM usage in bytes
alignment – Memory address alignment requirement (default: 16 bytes)
-
inline ~MemoryManagerGreedy()
Destructor that releases all managed memory resources.
-
virtual bool alloc(fbs::FbsModel *fbs_model, std::vector<dl::module::Module*> &execution_plan, ModelContext *context)
Allocates memory for all network tensors following greedy strategy.
- Parameters
fbs_model – FlatBuffer model containing network architecture
execution_plan – Execution graph ordered by computation dependencies
context – Device-specific runtime configuration
- Returns
bool True if successful allocation, false if memory insufficient
-
void free()
Releases all allocated memory including tensor buffers and memory pools.
-
inline MemoryManagerGreedy(int max_internal_size, int alignment = 16)