ESP-WebRTC Solution

[中文]

Note

This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.

Overview

WebRTC is widely used for real-time, low-latency communication, initially designed for peer-to-peer (P2P) and now forms the basis for video conferencing, streaming, and IoT applications. Compared to MQTT or WebSocket, WebRTC has better optimization for real-time media transmission and NAT traversal. ICE (Interactive Connectivity Establishment) is a key mechanism of WebRTC, which achieves reliable connections across NAT and firewalls through STUN and TURN servers. The ESP-WebRTC solution further simplifies the STUN/TURN configuration with its support for ICE.

Protocol Comparison

Feature / Protocol

WebRTC

MQTT

WebSocket

Communication Mode

Peer-to-peer (established via STUN/TURN/ICE)

Proxy-based publish/subscribe

Client-server

Media Support

✅ Audio, video, data

❌ Messages only (binary/text)

❌ Messages only (binary/text)

Latency

Ultra-low latency (possibly less than 100 ms)

Low latency (depends on the proxy, about 10 to 100 ms)

Low latency (about tens of milliseconds)

Reliability

Reliable + unreliable channels (SCTP)

QoS levels (0,1,2)

TCP reliable only

NAT Traversal

✅ Built-in STUN/TURN support

❌ Direct TCP only

❌ Direct TCP only

Security

DTLS/SRTP (mandatory encryption)

TLS (optional, MQTTS)

TLS (optional, WSS)

Scalability

Multi-party requires SFU/MCU

Good scalability with proxy

Server clusters/load balancing

Best Use Case

Real-time calls, meetings, games, P2P file sharing

IoT telemetry, device control, sensor data

Chat, real-time updates, dashboards

Application Fields

WebRTC can achieve low-latency communication in embedded and IoT systems, while supporting media transmission, such as IP cameras and video conferencing, and can also transmit any type of data through the WebRTC data channel.

User Support

Protocol Layer Users

For users who only focus on the connection layer, they can use the lightweight peer connection implementation provided by esp_peer. Its features include:

  • Full ICE support (STUN + TURN)

  • Fast establishment optimization, optimizing startup time

  • Minimal dependencies, only need libsrtp

  • Low resource consumption (about 60 KB/connection)

  • Low latency (about 260 ms between ESP32 and mobile phone)

  • Core protocol implemented from scratch, easier to expand

Detailed document esp_peer

Best Practice Reference peer_demo

Application Layer Users

Based on the protocol layer, application layer users can implement audio and video capture, rendering, and signaling processing based on the connection capabilities provided by ESP-WebRTC. This allows for rapid prototyping, requiring only signal replacement. The main components include:

  • esp_capture: Capture encoded audio/video from hardware

  • av_render: Audio and video playback

  • Signaling: Supports AppRTC, WHIP, OpenAI

Solutions and Demonstrations

Other Resources