General Steps

Note

This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.

This document summarizes the general implementation process of the MQTT protocol in ESP-IDF, covering the basic explanation of the protocol and the method of structuring configuration.

By mastering the content of this document, developers can quickly understand the key logic of the protocol, providing a unified reference for subsequent example learning.

MQTT Protocol

Overview

MQTT (Message Queuing Telemetry Transport) is a lightweight message transmission protocol based on the publish/subscribe model, designed specifically for the Internet of Things (IoT) and resource-constrained devices. The protocol adopts a tripartite architecture:

Publisher sends messages to the topic;

Subscriber subscribes to the topic and receives messages;

Broker is responsible for transferring messages, decoupling the publisher and the subscriber.

MQTT supports different levels of Quality of Service (QoS), and also provides retained messages and will message mechanisms for reliable data transmission and client disconnection notification:

QoS 0: The message sender sends the message only once, does not retry, and does not guarantee that the Broker or subscriber will definitely receive the message. Messages may be lost, suitable for scenarios that require high real-time performance but allow occasional loss of messages, such as periodic sensor data reporting or status broadcasting.

QoS 1: The message sender will retry to send, ensuring that the message is received by the Broker or subscriber at least once. Messages will not be lost, but may be repeated, suitable for scenarios that need to ensure message delivery, but can tolerate repeated messages, such as command control or alarm notification.

QoS 2: The four-step handshake mechanism ensures that the message is delivered exactly once, neither lost nor repeated. The highest reliability but also the highest overhead, suitable for scenarios that require extremely high message reliability, such as financial transactions, critical control commands or data collection.

MQTT has the characteristics of long connection, low bandwidth consumption and small resource occupation, and is very suitable for unstable network environments. Typical application scenarios include IoT sensor data upload, smart home control, mobile message push, industrial monitoring and remote control, as well as vehicle networking and other real-time data communication scenarios.

Transport Layer Protocol

MQTT is an application layer protocol, used to define message formats, topic mechanisms, publishing and subscription rules, but does not take responsibility for the reliable transmission of underlying data. In actual communication, it relies on the transport layer protocol to complete end-to-end data transmission services, ensuring that data can be reliably transmitted from a process on one host to a process on another host.

MQTT can run on top of various transport layer protocols, and its security, configuration complexity, and applicable scenarios vary depending on the type of protocol.

TCP is the most basic protocol, plaintext transmission, no encryption and identity verification, suitable for intranet communication or debugging environment. The URI prefix is mqtt://.
TLS/SSL adds encryption and server identity verification on top of TCP, optionally configures two-way certificate authentication, suitable for public network communication and IoT applications with higher security requirements. The URI prefix is mqtts://.
TLS-PSK is a variant of TLS, which encrypts and verifies through a pre-shared key (PSK), reducing the complexity of certificate management, suitable for resource-constrained devices or small-scale IoT networks.
TLS + Digital Signature Module (ssl_ds) stores the private key in the hardware security module, and the hardware performs the signature operation, enhancing the protection of the private key, suitable for high-security industrial or financial scenarios.
TLS Two-way Authentication (ssl_mutual_auth) verifies each other through client and server certificates, providing the highest level of security, suitable for systems that require strict identity authentication.
WebSocket protocol implements two-way communication through HTTP upgrade, suitable for browser client access, but does not provide encryption. The URI prefix is ws://.
WebSocket Secure uses TLS on the basis of WebSocket, balancing browser access and data security, commonly used for securely accessing MQTT Broker through HTTPS environment. The URI prefix is wss://.

The choice of different transport layer protocols directly affects the complexity and security of MQTT configuration. For example, TCP and ws protocols are plaintext transmission, no need to configure certificates or server verification, simple configuration; while TLS, wss, PSK, digital signature or two-way authentication protocols need to configure verification structure, including server CA, client certificate or PSK, to ensure secure communication.

MQTT Structure Configuration

In ESP-IDF, the MQTT client is configured through the esp_mqtt_client_config_t structure, which is used to define various parameters for client and Broker connection, authentication, session management, network, tasks, and message buffering. For specific structure members, refer to ESP-MQTT.

Broker Configuration

Address Configuration is used to specify the hostname, port, transport protocol, and URI path of the Broker, ensuring that the client can correctly locate and establish a connection. During the configuration process, if a complete URI is provided, the client will prioritize using the URI to establish a connection, and other fields can be ignored; if the URI is not set, at least the hostname, port, and transport protocol need to be configured.

Verification Configuration is used to define the server authentication method, such as whether to use a CA certificate, TLS-PSK or ALPN protocol, and whether to verify the common name of the server certificate, to ensure communication security and prevent man-in-the-middle attacks. For plaintext TCP or WebSocket connections, verification can be omitted, but when using encryption protocols such as TLS or WSS, verification parameters must be configured to ensure the server is trustworthy.

Client Configuration

Credential Configuration is used to set the unique identifier for the client instance, username, and password. When the Broker has enabled access control or authentication mechanisms, it must be configured to ensure that the Broker can correctly distinguish different devices when there are multiple devices, and use the username and password for Broker authentication.

Client Authentication Parameters include certificates, private keys, digital signature modules, or secure element handles, which only need to be configured when enabling TLS two-way authentication or using hardware security modules for authentication, to ensure the client’s identity is trustworthy and communication is encrypted.

Session Configuration

Session configuration is used to control the interaction behavior between the client and the Broker, including cleaning session flags, heartbeat intervals, will messages, and message retransmission timeouts, to ensure connection reliability and message transmission stability.

Clean Session Flag determines whether to retain subscription information and unsent messages after disconnection;

Heartbeat Interval is used to maintain long connection activity and timely discover abnormal disconnections;

Will Message is automatically published by the Broker when the client disconnects abnormally, notifying subscribers that the client is offline, used for abnormal monitoring, alarms, or status synchronization;

Message Retransmission Timeout is used to ensure that when the QoS level is 1 or 2, messages can be resent in case of network abnormalities, thus ensuring reliable message delivery.

The timeout in the session configuration mainly controls the behavior of the MQTT protocol layer, which is different from the network layer operation timeout, it ensures the reliability of messages and heartbeats.

Network Configuration

Network configuration is used to manage the underlying network interaction between the client and the Broker, including automatic reconnection strategy, network connection operation timeout, interface selection, and customizable transport handle.

Network Operation Timeout is used to control TCP/TLS/WebSocket connections and data transmission and reception to be completed within a specified time, preventing network blocking.

Automatic Reconnection Interval is used to specify how long to try to reestablish a connection after disconnection.

The timeout operation in the network configuration belongs to the transport layer timeout, ensuring that the client can reconnect and restore communication in a timely manner when the network is unstable or the connection is abnormal.

Task Configuration

Task configuration defines the priority and stack size of FreeRTOS tasks, ensuring that the MQTT client task is stably executed in the system, not preempted or blocked by other tasks.

Buffer Configuration

Buffer configuration controls the size of the input and output buffers, used to support different message volumes and QoS levels, preventing message blocking or loss. It can be adjusted appropriately according to the application message frequency and load.

Outbox Configuration

Outbox configuration is used to store messages to be sent or confirmed, ensuring that messages are not lost in high-reliability scenarios or critical data transmission, such as industrial monitoring, remote control, or critical IoT data collection applications.