Multimedia Technology Wiki: Application Solution
Note
This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.
ESP-WebRTC Solution
Overview:
WebRTC is widely used for real-time, low-latency communication, initially designed for peer-to-peer (P2P) and now forms the basis for video conferencing, streaming media, and IoT applications. Compared to MQTT or WebSocket, WebRTC has better optimization for real-time media transmission and NAT traversal. ICE (Interactive Connectivity Establishment) is a key mechanism of WebRTC, which achieves reliable connections across NAT and firewalls through STUN and TURN servers. The ESP-WebRTC solution further simplifies the STUN/TURN configuration with its support for ICE.
Feature / Protocol |
WebRTC |
MQTT |
WebSocket |
|---|---|---|---|
Communication Mode |
Peer-to-peer (established via STUN/TURN/ICE) |
Proxy-based publish/subscribe |
Client-server |
Media Support |
✅ Audio, video, data |
❌ Messages only (binary/text) |
❌ Messages only (binary/text) |
Latency |
Ultra-low latency (possibly less than 100 ms) |
Low latency (depends on the proxy, about 10 to 100 ms) |
Low latency (about tens of milliseconds) |
Reliability |
Reliable + unreliable channels (SCTP) |
QoS levels (0,1,2) |
TCP reliable only |
NAT Traversal |
✅ Built-in STUN/TURN support |
❌ Direct TCP only |
❌ Direct TCP only |
Security |
DTLS/SRTP (mandatory encryption) |
TLS (optional, MQTTS) |
TLS (optional, WSS) |
Scalability |
SFU/MCU needed for multiple parties |
Good scalability with proxy |
Server clusters/load balancing |
Best Use Case |
Real-time calls, meetings, games, P2P file sharing |
IoT telemetry, device control, sensor data |
Chat, real-time updates, dashboards |
Application Fields:
WebRTC can achieve low-latency communication in embedded and IoT systems, while supporting media transmission, such as IP cameras and video conferencing, and can also transmit any type of data through the WebRTC data channel.
User Support:
Protocol Layer Users: For users who only focus on the connection layer, they can use the lightweight peer connection implementation provided by
esp_peer. Its features include:Full ICE support (STUN + TURN)
Fast establishment optimization, optimized startup time
Minimal dependencies, only need
libsrtpLow resource consumption (about 60 KB/connection)
Low latency (about 260 ms between ESP32 and mobile phone)
Core protocol implemented from scratch, easier to expand
Detailed documentation esp_peer
Best practice reference peer_demo
Application Layer Users: Based on the protocol layer, application layer users can implement audio and video capture, rendering, and signaling processing based on the connection capabilities provided by ESP-WebRTC. This allows for rapid prototype construction, requiring only signal replacement. The main components include:
esp_capture: Capture encoded audio/video from hardware
av_render: Audio and video playback
Signaling: Supports AppRTC, WHIP, OpenAI
Solutions and Demonstrations:
openai_demo: ESP32 as a client for OpenAI services.
doorbell_demo: Smart doorbell using AppRTC signaling.
doorbell_local: Local doorbell supporting pedestrian AI detection, no signaling server required.
video_call: Video calling through data channels.
whip_demo: Push ESP32 media stream to WHIP server.
serverless_mqtt: Connection establishment without a server.
livekit client: Allows connection to LiveKit cloud.
openAI video assistant: Enable image analysis through voice control.
Other Resources:
Introduction to WebRTC - High Performance Browser Networking
WebRTC for the Curious - Deep understanding of WebRTC internals
ESP-Hosted-MCU Solution
Basic Functions
Overview:
ESP-Hosted-MCU is an open-source solution that allows Espressif chipsets and modules to be used as communication slaves. This solution provides wireless connectivity (Wi-Fi and Bluetooth) for the host microprocessor or microcontroller, enabling it to communicate with other devices.
For the framework of ESP-Hosted host and slave functions, please refer to Introduction
Slave Selection Guide:
For a comparison of communication interfaces and throughput, please refer to Decide the communication bus in between host and slave
Comparison of SDIO interface slave chip parameters
Model |
SRAM |
GPIO |
Feature |
|---|---|---|---|
ESP32 |
520 KB |
34 |
2.4 GHz Wi-Fi-and-Bluetooth |
ESP32-C5 |
384 KB |
22 |
2.4 and 5 GHz dual-band Wi-Fi 6, Bluetooth LE 5, Zigbee 3.0 and Thread 1.3 |
ESP32-C6 |
512 KB |
30 |
2.4 GHz Wi-Fi 4/Wi-Fi 6 and Bluetooth LE |
For more explanations, please refer to ESP-Hosted-MCU. The dependencies and related configurations of the master and slave can be viewed according to the communication interface used.
ESP-Hosted Loading Process
Component initialization: esp_hosted_init()
Add wifi remote channel: add_esp_wifi_remote_channels()
Initialize SDIO driver and create related tasks: bus_init_internal()
Initialize RPC related interfaces: rpc_core_init()
Call
esp_wifi_init()in the main application
Execute remote initialization: esp_wifi_remote_init()
Execute RPC related initialization: rpc_wifi_init()
Typical Example: Host completes network configuration through slave
Host Sending Process
Wi-Fi Default Configuration: WIFI_INIT_CONFIG_DEFAULT
Wi-Fi Initialization: rpc_wifi_init()
RPC Task Handling: rpc_tx_thread()
Get Response Return Value: rpc_rsp_callback()
Register and Receive Slave Events (such as
WIFI_EVENT_STA_STARTevent): rpc_event_callback()
Slave Processing Process
Receive and Execute RPC Commands: esp_rpc_command_dispatcher()
Configure Wi-Fi According to Incoming Parameters: req_wifi_init()
Handle Wi-Fi Events and Send to Message Return Queue: event_handler_wifi()
Task Handling and Trigger Event Callback: pserial_task()
Sending events to the host: rpc_evt_handler()
Network Split Function
Overview:
The Network Split function allows the host MCU and ESP32 slave to share an IP address and distribute traffic between them. When the host is in sleep mode, the slave can continue to handle selected network activities (such as MQTT, DNS).
Port-based data forwarding
Shared IP address
Support for specific port packet filtering
Support for specific packet wake-up, such as “wakeup-host” included in MQTT messages
Support for the host and slave to simultaneously call
esp_wifi_xx()and other related interfaces to complete network connection
For more information, please refer to Network Split Feature for ESP-Hosted MCU
Host Deep Sleep with Slave Maintaining MQTT Keep-Alive Function
Overview:
This function allows the host MCU to enter a low-power state while maintaining the slave’s network connection, thereby improving the energy efficiency of battery-powered devices.
The slave maintains the network connection when the host enters deep sleep or power off
The slave can wake up the host through specific packets or execute specific commands
Seamless switching of network packets during sleep wake-up
Needs to be used in conjunction with the network split function
For more information, please refer to Host Power Save (ESP-Hosted MCU)
FAQ
Q: Why is the communication between the host and slave failing?
A: You can troubleshoot as follows:
Check if the versions of the
ESP-Hostedcomponent on the host and slave are consistentCheck if the configurations of the host and slave are consistent, such as communication interfaces (SDIO or SPI), communication rates, etc.
Check if the hardware connection is normal, such as IO pins, power supply, etc. For more information, refer to Hardware Guide
Q: Why can’t the configured CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM take effect?
A: Starting from the esp_wifi_remote 0.8.0 version, the configuration item name has been changed to CONFIG_WIFI_RMT_STATIC_RX_BUFFER_NUM. Please configure it in (Top) > Component config > Wi-Fi Remote > Wi-Fi configuration. The old CONFIG_ESP_WIFI_STATIC_RX_BUFFER_NUM configuration item is no longer supported.
Q: Why is there a compilation failure?
A: Some structure definitions in ESP-IDF have changed, and the updated component definitions to adapt to the new version are not compatible with the old version of ESP-IDF. It is recommended to upgrade ESP-IDF to a compatible version, or use a matching component version.
Q: What does the following log mean: === ESP-Hosted Version Warning ===?
A: This warning indicates that the versions of the ESP-Hosted component used by the host (such as ESP32-P4) and the slave (such as ESP32-C6) are inconsistent. It is strongly recommended that the host and slave use the same version to avoid communication abnormalities due to version differences.
Q: What does the following log mean: Identified slave [esp32c6] != Expected [esp32]?
A: This error indicates that the actual detected slave chip model (esp32-c6) does not match the model specified in the configuration (esp32). Please reselect the correct slave chip model in (Top) > Component config > Wi-Fi Remote > choose slave target.
Q: Why does ESP32-P4 + ESP32-C5 report insufficient memory after turning on Wi-Fi and BLE?
A: The internal memory of ESP32-C5 is only 384 KB, and the memory resources are relatively tight, which can easily lead to memory allocation failure. The following configuration can be used in the slave to further optimize memory usage:
CONFIG_ESP_SDIO_RX_Q_SIZE=10 CONFIG_ESP_WIFI_IRAM_OPT=n CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=n CONFIG_ESP_WIFI_RX_IRAM_OPT=n CONFIG_ESP_WIFI_SLP_IRAM_OPT=n CONFIG_LWIP_IRAM_OPTIMIZATION=n CONFIG_LWIP_EXTRA_IRAM_OPTIMIZATION=n CONFIG_FREERTOS_PLACE_FUNCTIONS_INTO_FLASH=y
Q: How to speed up IP acquisition?
A: Both the Host and Slave can use the following configuration:
CONFIG_COMPILER_OPTIMIZATION_PERF=y CONFIG_BOOTLOADER_LOG_LEVEL_NONE=y CONFIG_LOG_DEFAULT_LEVEL_ERROR=y CONFIG_BOOTLOADER_SKIP_VALIDATE_ALWAYS=y CONFIG_ESPTOOLPY_FLASHMODE_QIO=y CONFIG_ESPTOOLPY_FLASHFREQ_80M=y CONFIG_LWIP_DHCP_RESTORE_LAST_IP=y CONFIG_SPIRAM_MEMTEST=n
Use bootloader_hooks or bootloader_override to reset the slave in advance to reduce the waiting time of the host.
By default, hosted is initialized during the bootloader stage. For quick startup, you can shield esp_hosted_init(), and call
esp_hosted_init()before the application programesp_wifi_init().
Q: Relationship between ESP-IDF version and hosted/remote version
A: You can pull the components in the following way, or directly use the latest components. If the host updates the component version and the version difference is significant, it is recommended that the slave also be updated.
espressif/esp_hosted: version: ^2.4 rules: - if: target in [esp32p4] espressif/esp_wifi_remote: version: ^1.0 rules: - if: target in [esp32p4]
ESP32-C6: It is recommended to use esp_hosted ≥ 2.4.2, esp_wifi_remote ≥ 1.0.0, ESP-IDF ≥ v5.3.2
ESP32-C5: It is recommended to use esp_hosted ≥ 2.4.2, esp_wifi_remote ≥ 1.0.0, ESP-IDF ≥ v5.5
AI Agent Solution
Overview:
AI Agents have implemented audio and video interactive application code based on the ESP32 platform. This application is based on the ESP-GMF architecture and integrates AI Agent device-side development, providing developers with a complete audio and video interactive solution.
Application Architecture:
The AI Agents application is based on the ESP-GMF architecture, mainly including the following two core modules:
Audio-processor module
Mainly responsible for audio data processing, including:
Audio collection
Audio encoding and decoding
Audio enhancement
3A algorithm (AEC, ANS, AGC)
Audio synthesis
Video-processor module
Mainly responsible for video data processing, including:
Video collection
Video encoding and decoding
Video rendering
Functional Features:
The AI Agents application supports a variety of mainstream AI platforms and functions:
Volcano RTC voice call and visual processing: Volcano RTC Example
COZE voice interaction and visual processing: COZE Example
BRTC voice call, visual processing, and audio and video dialogue: BRTC Example
Tencent Cloud RTC voice call: Tencent Cloud RTC Example