ESP-Media-Protocols Component

[中文]

Note

This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.

Overview

Multimedia protocols are a collection of various communication protocols, widely used in scenarios such as streaming media transmission, device control, and device interconnection communication. Typical applications are as follows:

  • Most network cameras in security systems have built-in RTSP servers, which compress the collected video and provide video streams using the RTP protocol, allowing access to monitoring platforms, NVRs, VLC players, etc. for stream pulling;

  • GB28181 (full name: GB/T 28181-2016), a national standard issued by the Chinese Ministry of Public Security, defines the technical requirements for information transmission, exchange, and control of public safety video surveillance networking systems. It uses the SIP protocol to complete device registration, heartbeat, call and other signaling controls, uses SDP to describe media session information, and uses RTP and RTCP to transmit and control media data in real time;

  • VoIP, video conferencing, visual intercom systems, complete call and voice, video communication functions based on the SIP protocol;

  • Broadcasting systems, live streaming platforms, devices collect media streams and push them to the server based on the RTMP protocol, and multiple client devices pull streams from the server and play them based on the RTMP protocol.

ESP-Media-Protocols is a multimedia protocol library launched by Espressif Systems, providing support for basic and mainstream multimedia protocols.

Protocol

Layer

Function

Common Application Scenarios

RTP/RTCP

Transport Layer

Real-time transmission of audio and video streams, providing quality information

Network cameras, low-latency transmission of media data for real-time calls/conferences, RTCP provides transmission quality statistics

RTSP

Application Layer

As a server, it supports being pulled, as a client, it supports pulling and pushing streams

Low-latency unidirectional transmission of media data from network cameras, playback control

SIP

Application Layer

Session terminal, supports registration to SIP server, supports initiating and receiving sessions

Low-latency bidirectional transmission of media data between intercoms, telephones, and session management for intercoms, conferences

RTMP

Application Layer

As a server, it supports being pulled and receiving pushed streams, as a client, it supports pulling and pushing streams

Live streaming and backend distribution (device pushes streams to live server/platform), live access

MRM

/

Multi-device master-slave synchronized music playback

Multi-room audio synchronized playback (smart speakers, home theater multi-device synchronization)

UPnP

/

Device interconnection, media and service sharing

Device discovery and media sharing within the home (mobile/PC discovers TV/NAS and casts or plays)

Performance Data

Protocol performance comparison

Protocol

Real-time

Data Stream

Control Stream

Device Discovery

TLS Encryption

Complexity

RTSP

High

Yes

Yes

Manual

No

Medium

SIP

High

Yes

Yes

Manual

Yes

Medium

RTMP

Medium

Yes

Basic

Manual

Yes

Medium

MRM

High

Yes

Yes

Automatic

No

Low

UPnP

Low

Yes

Yes

Automatic

No

Medium

You can easily identify the protocol to use through the following flowchart:

Real-time

  • Low latency: For data used for control or command transmission, the delay is about 20 ms.

  • Low latency: For audio, video or other media stream transmission, the delay is about 300 ms.

  • Medium latency: For live streams based on RTMP, the delay is about 2 seconds.

Security

  • TLS (optional)

  • MD5 digest authentication (mandatory for SIP)

Scalability

  • Customizable protocol header and body

  • Supports subscription and notification, can register services

Concurrency

  • Supports multiple client connections (RTMP)

Compatibility

  • SIP supports linphone, Asterisk FreePBX, Freeswitch, Kamailio

  • RTSP supports ffmpeg, vlc, live555, mediamtx

  • RTMP supports ffmpeg, vlc

  • UPnP supports NetEase Cloud Music

Media support

Memory consumption data

How to use

The ESP-Media-Protocols component is hosted on Github. You can add this component to your project by entering the following command in the project.

idf.py add-dependency "espressif/esp_media_protocols^0.5.1"

Before using the ESP-Media-Protocols component, it is recommended to refer to and debug the following example projects to familiarize yourself with the use of APIs and the specific application of the protocol stack.

FAQ

Q: Does ESP-Media-Protocols support all protocols and features?

A: ESP-Media-Protocols currently supports the basic protocols and features widely used in the embedded field. Some unsupported protocols such as SRTP, HLS, etc., can be found and used under other components or repositories. The supported protocol specifications will be continuously iterated and expanded, and we will also update and consider expansion according to customer needs. In the future, we plan to support some new protocols with strong features.

Q: Some protocol features overlap, how to choose when using?

A: Depending on the application scenario, specifically analyze the functional requirements, latency requirements, and network environment. For example, if real-time requirements are high and real-time control (pause, fast forward, rewind, positioning) is needed, RTSP is usually used; if real-time requirements are high and real-time interaction is needed, SIP can be used to create sessions; for large-scale live broadcasts in browsers, where stability and compatibility are highly required and real-time is not highly required, RTMP can be considered.