Introduce a New Model

[中文]

ESP-VISION supports two model runtime paths: ESP-DL .espdl models use the espdl – Model Inference module, and TensorFlow Lite .tflite models use the tflite – Model Inference module. Model files are not built into the firmware; they live on board storage and are loaded at runtime. This guide adds a new model and runs it.

1. Obtain or Convert the Model

Choose the model runtime first:

ESP-DL: get a ready .espdl from the ESP-DL model zoo, or convert your own model to the .espdl format with the ESP-DL quantization/export toolchain, matching the selected chip (ESP32-P4, ESP32-S3, or ESP32-S31).
TFLite Micro: use a TensorFlow Lite .tflite flatbuffer that fits TensorFlow Lite Micro and the enabled operator set. Quantized int8 models are usually the practical target on the current boards.

Keep the directory layout under models/ in the repository when adding shared assets, mirroring models/espdet/ and models/tflite/.

2. Copy the Model to Board Storage

Place the .espdl or .tflite file on storage the firmware can read, such as /sdcard or /flash:

SD card: copy the file onto the card, which mounts at /sdcard.
On-flash FAT (ffat): the data partition is exposed over USB MSC, so you can drag the file onto the mass-storage drive; it is visible as /flash.

3. Pick the Right API

Choose the API that matches the runtime and task:

Task	API	Result
ESP-DL object detection (ESPDet)	`espdl.ESPDet`	`(x, y, w, h, score, category)`
ESP-DL object detection (YOLO11)	`espdl.YOLO11`	`(x, y, w, h, score, category)`
ESP-DL pose detection	`espdl.YOLO11nPose`	detection plus 17 COCO keypoints
ESP-DL image classification	`espdl.ImageNetCls`	`(label, score)`
ESP-DL with Python decoding	`espdl.Model`	raw output bytes with tensor metadata
Generic TFLite Micro execution	`tflite.Model`	raw output tensors, or a callback result

For ESP-DL wrappers, pass mean, std, score, nms, topk, or softmax to the constructor when the model needs different preprocessing or filtering. When you need to keep ESP-DL inference but decode outputs in Python, inspect espdl.Model.inputs() and espdl.Model.outputs(), then decode the RawTensor bytes returned by predict(). For TFLite Micro models, inspect input_shape, input_dtype, input_scale, input_zero_point, output_shape, output_dtype, output_scale, and output_zero_point and implement the model-specific preprocessing or post-processing in Python or helper code.

4. Run ESP-DL Inference

import sensor, image, espdl

sensor.reset()
sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.QVGA)

det = espdl.ESPDet("/sdcard/my_model.espdl", score=0.5, nms=0.45)

while True:
    img = sensor.snapshot()
    for x, y, w, h, score, category in det.detect(img):
        img.draw_rectangle(x, y, w, h, color=(255, 0, 0))
    img.flush()

5. Decode ESP-DL Outputs in Python

Use espdl.Model when the model can use ESP-DL image preprocessing and inference, but its output tensors need Python-side decoding.

import sensor, espdl

model = espdl.Model("/sdcard/my_model.espdl", mean=(0, 0, 0), std=(255, 255, 255), letterbox=True)
try:
    print("inputs:", model.inputs())
    print("outputs:", model.outputs())
    img = sensor.snapshot()
    outputs = model.predict(img)
    for name, tensor in outputs.items():
        _, shape, dtype, exponent, raw = tensor
        print(name, shape, dtype, exponent, len(raw))
finally:
    model.deinit()

Each output tensor is returned as raw bytes plus shape, dtype, and ESP-DL exponent metadata. Decode the bytes according to the tensor type, apply the exponent scale, and then run model-specific post-processing such as sigmoid, box decode, NMS, classification top-k, or coordinate unletterboxing.

6. Run TFLite Micro Inference

import tflite

def fill_input(buffer, shape, dtype_code):
    # Fill model-specific quantized input bytes.
    ...

model = tflite.Model("/sdcard/my_model.tflite")
try:
    print("input:", model.input_shape, model.input_dtype, model.input_scale, model.input_zero_point)
    print("output:", model.output_shape, model.output_dtype, model.output_scale, model.output_zero_point)
    outputs = model.predict([fill_input])
    raw = outputs[0]
finally:
    model.deinit()

See example/03-Machine-Learning/00-ESP-DL/ for ESP-DL scripts (espdet_pico.py, espdet_pico_python.py, yolo11.py, yolo11n_pose.py, imagenet_cls.py), and example/03-Machine-Learning/01-TFLite/ for TFLite Micro scripts (person_detection.py and sine.py).

7. Optional: Profiling and Validation

Use espdl.load_model() with profile=True to emit ESP-DL profiling output when verifying a new .espdl model’s performance. For TFLite Micro models, print model.len, model.ram, input metadata, and output metadata to verify flash size, arena size, tensor layout, and quantization before tuning post-processing.