Multimedia Technology Wiki: Application Solution

[中文]

Note

This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.

ESP-WebRTC Solution

Overview:

WebRTC is widely used for real-time, low-latency communication, initially designed for peer-to-peer (P2P) and now forms the basis for video conferencing, streaming media, and IoT applications. Compared to MQTT or WebSocket, WebRTC has better optimization for real-time media transmission and NAT traversal. ICE (Interactive Connectivity Establishment) is a key mechanism of WebRTC, which achieves reliable connections across NAT and firewalls through STUN and TURN servers. The ESP-WebRTC solution further simplifies the STUN/TURN configuration with its support for ICE.

Protocol Comparison

Feature / Protocol

WebRTC

MQTT

WebSocket

Communication Mode

Peer-to-peer (established via STUN/TURN/ICE)

Proxy-based publish/subscribe

Client-server

Media Support

✅ Audio, video, data

❌ Messages only (binary/text)

❌ Messages only (binary/text)

Latency

Ultra-low latency (possibly less than 100 ms)

Low latency (depends on the proxy, about 10 to 100 ms)

Low latency (about tens of milliseconds)

Reliability

Reliable + unreliable channels (SCTP)

QoS levels (0,1,2)

TCP reliable only

NAT Traversal

✅ Built-in STUN/TURN support

❌ Direct TCP only

❌ Direct TCP only

Security

DTLS/SRTP (mandatory encryption)

TLS (optional, MQTTS)

TLS (optional, WSS)

Scalability

SFU/MCU needed for multiple parties

Good scalability with proxy

Server clusters/load balancing

Best Use Case

Real-time calls, meetings, games, P2P file sharing

IoT telemetry, device control, sensor data

Chat, real-time updates, dashboards

Application Fields:

WebRTC can achieve low-latency communication in embedded and IoT systems, while supporting media transmission, such as IP cameras and video conferencing, and can also transmit any type of data through the WebRTC data channel.

User Support:

  • Protocol Layer Users: For users who only focus on the connection layer, they can use the lightweight peer connection implementation provided by esp_peer. Its features include:

    • Full ICE support (STUN + TURN)

    • Fast establishment optimization, optimized startup time

    • Minimal dependencies, only need libsrtp

    • Low resource consumption (about 60 KB/connection)

    • Low latency (about 260 ms between ESP32 and mobile phone)

    • Core protocol implemented from scratch, easier to expand

    Detailed documentation esp_peer

    Best practice reference peer_demo

  • Application Layer Users: Based on the protocol layer, application layer users can implement audio and video capture, rendering, and signaling processing based on the connection capabilities provided by ESP-WebRTC. This allows for rapid prototype construction, requiring only signal replacement. The main components include:

    • esp_capture: Capture encoded audio/video from hardware

    • av_render: Audio and video playback

    • Signaling: Supports AppRTC, WHIP, OpenAI

Solutions and Demonstrations:

Other Resources:

ESP-Hosted-MCU Solution

Basic Functions

Overview:

ESP-Hosted-MCU is an open-source solution that allows Espressif chipsets and modules to be used as communication slaves. This solution provides wireless connectivity (Wi-Fi and Bluetooth) for the host microprocessor or microcontroller, enabling it to communicate with other devices.

For the framework of ESP-Hosted host and slave functions, please refer to Introduction

Slave Selection Guide:

Model

SRAM

GPIO

Feature

ESP32

520 KB

34

2.4 GHz Wi-Fi-and-Bluetooth

ESP32-C5

384 KB

22

2.4 and 5 GHz dual-band Wi-Fi 6, Bluetooth LE 5, Zigbee 3.0 and Thread 1.3

ESP32-C6

512 KB

30

2.4 GHz Wi-Fi 4/Wi-Fi 6 and Bluetooth LE

For more explanations, please refer to ESP-Hosted-MCU. The dependencies and related configurations of the master and slave can be viewed according to the communication interface used.

ESP-Hosted Loading Process

  1. Component initialization: esp_hosted_init()

  1. Call esp_wifi_init() in the main application

Typical Example: Host completes network configuration through slave

Network Split Function

Overview:

The Network Split function allows the host MCU and ESP32 slave to share an IP address and distribute traffic between them. When the host is in sleep mode, the slave can continue to handle selected network activities (such as MQTT, DNS).

  • Port-based data forwarding

  • Shared IP address

  • Support for specific port packet filtering

  • Support for specific packet wake-up, such as “wakeup-host” included in MQTT messages

  • Support for the host and slave to simultaneously call esp_wifi_xx() and other related interfaces to complete network connection

For more information, please refer to Network Split Feature for ESP-Hosted MCU

Host Deep Sleep with Slave Maintaining MQTT Keep-Alive Function

Overview:

This function allows the host MCU to enter a low-power state while maintaining the slave’s network connection, thereby improving the energy efficiency of battery-powered devices.

  • The slave maintains the network connection when the host enters deep sleep or power off

  • The slave can wake up the host through specific packets or execute specific commands

  • Seamless switching of network packets during sleep wake-up

  • Needs to be used in conjunction with the network split function

For more information, please refer to Host Power Save (ESP-Hosted MCU)

FAQ

Q: Why is the communication between the host and slave failing?

A: You can troubleshoot as follows:

  • Check if the versions of the ESP-Hosted component on the host and slave are consistent

  • Check if the configurations of the host and slave are consistent, such as communication interfaces (SDIO or SPI), communication rates, etc.

  • Check if the hardware connection is normal, such as IO pins, power supply, etc. For more information, refer to Hardware Guide

Q: Why can’t the configured CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM take effect?

A: Starting from the esp_wifi_remote 0.8.0 version, the configuration item name has been changed to CONFIG_WIFI_RMT_STATIC_RX_BUFFER_NUM. Please configure it in (Top) > Component config > Wi-Fi Remote > Wi-Fi configuration. The old CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM configuration item is no longer supported.

Q: Why is there a compilation failure?

A: Some structure definitions in ESP-IDF have changed, and the updated component definitions to adapt to the new version are not compatible with the old version of ESP-IDF. It is recommended to upgrade ESP-IDF to a compatible version, or use a matching component version.

Q: What does the following log mean: === ESP-Hosted Version Warning ===?

A: This warning indicates that the versions of the ESP-Hosted component used by the host (such as ESP32-P4) and the slave (such as ESP32-C6) are inconsistent. It is strongly recommended that the host and slave use the same version to avoid communication abnormalities due to version differences.

Q: What does the following log mean: Identified slave [esp32c6] != Expected [esp32]?

A: This error indicates that the actual detected slave chip model (esp32-c6) does not match the model specified in the configuration (esp32). Please reselect the correct slave chip model in (Top) > Component config > Wi-Fi Remote > choose slave target.

Q: Why does ESP32-P4 + ESP32-C5 report insufficient memory after turning on Wi-Fi and BLE?

A: The internal memory of ESP32-C5 is only 384 KB, and the memory resources are relatively tight, which can easily lead to memory allocation failure. The following configuration can be used in the slave to further optimize memory usage:

CONFIG_ESP_SDIO_RX_Q_SIZE=10
CONFIG_ESP_WIFI_IRAM_OPT=n
CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=n
CONFIG_ESP_WIFI_RX_IRAM_OPT=n
CONFIG_ESP_WIFI_SLP_IRAM_OPT=n
CONFIG_LWIP_IRAM_OPTIMIZATION=n
CONFIG_LWIP_EXTRA_IRAM_OPTIMIZATION=n
CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH=y

Q: How to speed up IP acquisition?

A: Both the Host and Slave can use the following configuration:

CONFIG_COMPILER_OPTIMIZATION_PERF=y
CONFIG_BOOTLOADER_LOG_LEVEL_NONE=y
CONFIG_LOG_DEFAULT_LEVEL_ERROR=y
CONFIG_BOOTLOADER_SKIP_VALIDATE_ALWAYS=y
CONFIG_ESPTOOLPY_FLASHMODE_QIO=y
CONFIG_ESPTOOLPY_FLASHFREQ_80M=y
CONFIG_LWIP_DHCP_RESTORE_LAST_IP=y
CONFIG_SPIRAM_MEMTEST=n
  • Use bootloader_hooks or bootloader_override to reset the slave in advance to reduce the waiting time of the host.

  • By default, hosted is initialized during the bootloader stage. For quick startup, you can shield esp_hosted_init(), and call esp_hosted_init() before the application program esp_wifi_init().

Q: Relationship between ESP-IDF version and hosted/remote version

A: You can pull the components in the following way, or directly use the latest components. If the host updates the component version and the version difference is significant, it is recommended that the slave also be updated.

espressif/esp_hosted:
  version: ^2.4
  rules:
  - if: target in [esp32p4]
espressif/esp_wifi_remote:
  version: ^1.0
  rules:
  - if: target in [esp32p4]
  • ESP32-C6: It is recommended to use esp_hosted ≥ 2.4.2, esp_wifi_remote ≥ 1.0.0, ESP-IDF ≥ v5.3.2

  • ESP32-C5: It is recommended to use esp_hosted ≥ 2.4.2, esp_wifi_remote ≥ 1.0.0, ESP-IDF ≥ v5.5

AI Agent Solution

Overview:

AI Agents have implemented audio and video interactive application code based on the ESP32 platform. This application is based on the ESP-GMF architecture and integrates AI Agent device-side development, providing developers with a complete audio and video interactive solution.

Application Architecture:

The AI Agents application is based on the ESP-GMF architecture, mainly including the following two core modules:

  • Audio-processor module

    Mainly responsible for audio data processing, including:

    • Audio collection

    • Audio encoding and decoding

    • Audio enhancement

    • 3A algorithm (AEC, ANS, AGC)

    • Audio synthesis

  • Video-processor module

    Mainly responsible for video data processing, including:

    • Video collection

    • Video encoding and decoding

    • Video rendering

AI Agent Architecture Diagram

Functional Features:

The AI Agents application supports a variety of mainstream AI platforms and functions: