ESP-WebRTC Solution
Note
This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.
Overview
WebRTC is widely used for real-time, low-latency communication, initially designed for peer-to-peer (P2P) and now forms the basis for video conferencing, streaming, and IoT applications. Compared to MQTT or WebSocket, WebRTC has better optimization for real-time media transmission and NAT traversal. ICE (Interactive Connectivity Establishment) is a key mechanism of WebRTC, which achieves reliable connections across NAT and firewalls through STUN and TURN servers. The ESP-WebRTC solution further simplifies the STUN/TURN configuration with its support for ICE.
Feature / Protocol |
WebRTC |
MQTT |
WebSocket |
|---|---|---|---|
Communication Mode |
Peer-to-peer (established via STUN/TURN/ICE) |
Proxy-based publish/subscribe |
Client-server |
Media Support |
✅ Audio, video, data |
❌ Messages only (binary/text) |
❌ Messages only (binary/text) |
Latency |
Ultra-low latency (possibly less than 100 ms) |
Low latency (depends on the proxy, about 10 to 100 ms) |
Low latency (about tens of milliseconds) |
Reliability |
Reliable + unreliable channels (SCTP) |
QoS levels (0,1,2) |
TCP reliable only |
NAT Traversal |
✅ Built-in STUN/TURN support |
❌ Direct TCP only |
❌ Direct TCP only |
Security |
DTLS/SRTP (mandatory encryption) |
TLS (optional, MQTTS) |
TLS (optional, WSS) |
Scalability |
Multi-party requires SFU/MCU |
Good scalability with proxy |
Server clusters/load balancing |
Best Use Case |
Real-time calls, meetings, games, P2P file sharing |
IoT telemetry, device control, sensor data |
Chat, real-time updates, dashboards |
Application Fields
WebRTC can achieve low-latency communication in embedded and IoT systems, while supporting media transmission, such as IP cameras and video conferencing, and can also transmit any type of data through the WebRTC data channel.
User Support
Protocol Layer Users
For users who only focus on the connection layer, they can use the lightweight peer connection implementation provided by esp_peer. Its features include:
Full ICE support (STUN + TURN)
Fast establishment optimization, optimizing startup time
Minimal dependencies, only need
libsrtpLow resource consumption (about 60 KB/connection)
Low latency (about 260 ms between ESP32 and mobile phone)
Core protocol implemented from scratch, easier to expand
Detailed document esp_peer
Best Practice Reference peer_demo
Application Layer Users
Based on the protocol layer, application layer users can implement audio and video capture, rendering, and signaling processing based on the connection capabilities provided by ESP-WebRTC. This allows for rapid prototyping, requiring only signal replacement. The main components include:
esp_capture: Capture encoded audio/video from hardware
av_render: Audio and video playback
Signaling: Supports AppRTC, WHIP, OpenAI
Solutions and Demonstrations
openai_demo: ESP32 as a client of the OpenAI service.
doorbell_demo: A smart doorbell using AppRTC signaling.
doorbell_local: A local doorbell that supports pedestrian AI detection, no signaling server required.
video_call: Video calls through the data channel.
whip_demo: Push ESP32 media stream to WHIP server.
serverless_mqtt: Connection establishment without a server.
livekit client: Allows connection to LiveKit cloud.
openAI video assistant: Enable image analysis through voice control.
Other Resources
Introduction to WebRTC - High Performance Browser Networking
WebRTC for the Curious - Deep understanding of the internal mechanisms of WebRTC