Xiaozhi AI Chatbot

Supported chips

ESP32-S3

Xiaozhi is a bidirectional streaming dialogue component that connects to the xiaozhi.me service. It supports real-time voice/text interaction with AI agents using large language models like Qwen and DeepSeek.

This component is ideal for use cases such as voice assistants and intelligent voice Q&A systems. It features low latency and a lightweight design, making it suitable for applications running on embedded devices such as the ESP32.

Features

Bidirectional Streaming: Real-time voice and text interaction with AI agents
Multiple Communication Protocols: Supports WebSocket and MQTT+UDP protocols
Audio Codec Support: OPUS, G.711, and PCM audio formats
MCP Integration: Device-side MCP for device control (speaker, LED, servo, GPIO, etc.)
Multi-language Support: Chinese and English
Offline Wake Word: API to report wake word (e.g. esp_xiaozhi_chat_send_wake_word); ESP-SR integration is application-level

Architecture

Xiaozhi uses a streaming ASR (Automatic Speech Recognition) + LLM (Large Language Model) + TTS (Text-to-Speech) architecture for voice interaction:

Audio Input: Captures audio from microphone
ASR: Converts speech to text in real-time
LLM: Processes text and generates responses
TTS: Converts text responses to speech
Audio Output: Plays audio through speaker

The component integrates with the MCP (Model Context Protocol) to enable device control capabilities.

Examples

Xiaozhi App Example: ai/xiaozhi_chat. A complete voice assistant application demonstrating: - Voice interaction with AI agents - Device control via MCP protocol - Multi-language support - Display support

API Reference

The following sections are generated from the public headers under components/esp_xiaozhi/include/.

Chat

Header File

components/esp_xiaozhi/include/esp_xiaozhi_chat.h

Functions

esp_err_t esp_xiaozhi_chat_init(esp_xiaozhi_chat_config_t *config, esp_xiaozhi_chat_handle_t *chat_hd)

Instance the chat module.

The current implementation supports only one chat instance at a time.

Parameters

config – [in] Pointer to the chat configuration structure
chat_hd – [out] Pointer to the chat handle

Returns

ESP_OK On success
ESP_ERR_NO_MEM Out of memory
ESP_ERR_INVALID_ARG Invalid arguments
ESP_ERR_INVALID_STATE Another chat instance is already active

esp_err_t esp_xiaozhi_chat_deinit(esp_xiaozhi_chat_handle_t chat_hd)

Deinitialize the chat module.

This function releases chat-owned resources. If the chat session is still running, it will stop runtime resources first. The MCP engine is destroyed only when config.owns_mcp_engine was set to true during init.

Parameters

chat_hd – [in] Handle to the chat instance

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle

esp_err_t esp_xiaozhi_chat_start(esp_xiaozhi_chat_handle_t chat_hd)

Start the chat session.

Parameters

chat_hd – [in] Handle to the chat instance

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle
ESP_ERR_NOT_FOUND Required transport configuration is missing
Other Error from transport start (MQTT or WebSocket)

esp_err_t esp_xiaozhi_chat_stop(esp_xiaozhi_chat_handle_t chat_hd)

Stop the chat session.

Stops the active chat runtime, including audio channel and MCP manager resources, but does not destroy the configured MCP engine.

Parameters

chat_hd – [in] Handle to the chat instance

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle

esp_err_t esp_xiaozhi_chat_open_audio_channel(esp_xiaozhi_chat_handle_t chat_hd, const esp_xiaozhi_chat_audio_t *audio, char *message, size_t message_len)

Open audio channel.

Parameters

chat_hd – [in] Handle to the chat instance
audio – [in] Optional audio params for the generated hello (format, sample_rate, channels, frame_duration). Used only when message is NULL. NULL or zero fields mean defaults: “opus”, 16000, 1, 60. Non-zero values must be within valid protocol ranges.
message – [in] Optional message to send when opening the channel. If NULL, a default hello message will be generated
message_len – [in] Length of the message buffer. If 0 with message NULL, a default hello is generated; if message is non-NULL, message_len must be > 0 (otherwise returns ESP_ERR_INVALID_ARG)

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle, invalid audio params, or invalid message/message_len combination
ESP_ERR_NO_MEM Failed to allocate hello message buffer
ESP_ERR_INVALID_SIZE Hello message buffer too small
Other Error from get_hello_message, transport_send_text, or audio_open

esp_err_t esp_xiaozhi_chat_close_audio_channel(esp_xiaozhi_chat_handle_t chat_hd)

Close audio channel.

Parameters

chat_hd – [in] Handle to the chat instance

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle

esp_err_t esp_xiaozhi_chat_send_audio_data(esp_xiaozhi_chat_handle_t chat_hd, const char *data, size_t data_len)

Send audio data to the chat session.

Parameters

chat_hd – [in] Handle to the chat instance
data – [in] Pointer to the audio data buffer
data_len – [in] Length of the audio data in bytes

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle, data, or data_len is 0
ESP_ERR_INVALID_STATE Transport not ready for binary (e.g. audio channel not open)
Other Error from transport_send_binary

esp_err_t esp_xiaozhi_chat_send_wake_word(esp_xiaozhi_chat_handle_t chat_hd, const char *wake_word)

Send wake word detected.

Parameters

chat_hd – [in] Handle to the chat instance
wake_word – [in] Pointer to the wake word (non-empty string)

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle or wake_word
ESP_ERR_INVALID_STATE No session (open audio channel first)
ESP_ERR_NO_MEM Failed to create JSON
Other Error from transport_send_text

esp_err_t esp_xiaozhi_chat_send_start_listening(esp_xiaozhi_chat_handle_t chat_hd, int mode)

Send start listening.

Parameters

chat_hd – [in] Handle to the chat instance
mode – [in] Listening mode

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle
ESP_ERR_INVALID_STATE No session (open audio channel first)
ESP_ERR_NO_MEM Failed to create JSON
Other Error from transport_send_text

esp_err_t esp_xiaozhi_chat_send_stop_listening(esp_xiaozhi_chat_handle_t chat_hd)

Send stop listening.

Parameters

chat_hd – [in] Handle to the chat instance

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle
ESP_ERR_INVALID_STATE No session (open audio channel first)
ESP_ERR_NO_MEM Failed to create JSON
Other Error from transport_send_text

esp_err_t esp_xiaozhi_chat_send_abort_speaking(esp_xiaozhi_chat_handle_t chat_hd, esp_xiaozhi_chat_abort_speaking_reason_t reason)

Send abort speaking.

Parameters

chat_hd – [in] Handle to the chat instance
reason – [in] Reason for aborting speaking

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle
ESP_ERR_INVALID_STATE No session (open audio channel first)
ESP_ERR_NO_MEM Failed to create JSON
Other Error from transport_send_text

Structures

struct esp_xiaozhi_chat_audio_t

Audio packet for Xiaozhi chat; also used as audio params when passed to esp_xiaozhi_chat_open_audio_channel(). For packet use: set sample_rate, frame_duration, timestamp, payload, payload_size (format/channels ignored). For open_audio_channel use: set format (NULL = “opus”), sample_rate (0 = 16000, otherwise 8000-48000), channels (0 = 1, otherwise 1-2), frame_duration (0 = 60, otherwise 10-120); payload/timestamp/payload_size ignored.

Public Members

const char *format: Audio format for hello, e.g. “opus”, “pcm”. NULL means “opus”. Packet use: ignore

int sample_rate: Sample rate (Hz). For hello: 0 means 16000

int channels: Channel count for hello. 0 means 1. Packet use: ignore

int frame_duration: Frame duration (ms). For hello: 0 means 60

uint32_t timestamp: Timestamp (packet use only)

uint8_t *payload: Payload (packet use only)

size_t payload_size: Payload size (packet use only)

struct esp_xiaozhi_chat_tts_state_t

TTS state payload for ESP_XIAOZHI_CHAT_EVENT_CHAT_TTS_STATE Pointers valid only during the event callback.

Public Members

esp_xiaozhi_chat_tts_state_kind_t state: TTS state kind (start / stop / sentence_start)

const char *text: Non-NULL only when state is SENTENCE_START

struct esp_xiaozhi_chat_error_info_t

Error info for ESP_XIAOZHI_CHAT_EVENT_CHAT_ERROR (protocol layer only) Pointers valid only during the event callback.

Public Members

esp_err_t code: Error code

const char *source: Hint e.g. “transport”, “hello_timeout”, “udp”

struct esp_xiaozhi_chat_text_data_t

Text data structure for chat messages.

Public Members

esp_xiaozhi_chat_text_role_t role: Role of the message (user or assistant)

const char *text: Text content of the message

struct esp_xiaozhi_chat_config_t

Configuration structure for initializing a Xiaozhi chat session.

Public Members

esp_xiaozhi_chat_audio_type_t audio_type: Type of audio input/output to use

esp_xiaozhi_chat_audio_callback_t audio_callback: Callback function for handling audio data

esp_xiaozhi_chat_event_callback_t event_callback: Callback function for handling Xiaozhi events

void *audio_callback_ctx: Context pointer passed to the audio callback

void *event_callback_ctx: Context pointer passed to the event callback

esp_mcp_t *mcp_engine: MCP engine instance provided by the caller

bool owns_mcp_engine: Whether chat takes ownership of mcp_engine and destroys it in deinit

bool has_mqtt_config: True if server provides MQTT config (from get_info). When both MQTT and WebSocket supported, prefer MQTT

bool has_websocket_config: True if server provides WebSocket config (from get_info)

Macros

ESP_XIAOZHI_CHAT_EVENT_CONNECTED: Event bits for ESP event system (app may register for these). These are the only event bits exposed to the app; do not add internal sync flags here.

ESP_XIAOZHI_CHAT_EVENT_DISCONNECTED

ESP_XIAOZHI_CHAT_EVENT_AUDIO_CHANNEL_OPENED

ESP_XIAOZHI_CHAT_EVENT_AUDIO_CHANNEL_CLOSED

ESP_XIAOZHI_CHAT_EVENT_AUDIO_DATA_INCOMING

ESP_XIAOZHI_CHAT_EVENT_SERVER_GOODBYE

ESP_XIAOZHI_CHAT_DEFAULT_CONFIG(): Default configuration initializer for esp_xiaozhi_chat_config_t.

Type Definitions

typedef uint32_t esp_xiaozhi_chat_handle_t: Handle for a Xiaozhi chat session.

typedef void (*esp_xiaozhi_chat_audio_callback_t)(const uint8_t *data, int len, void *ctx)

Callback for receiving audio data during chat.

The data buffer is owned by the chat module and is only valid for the duration of this callback. Implementations must consume or copy the data before returning and must not store the pointer for asynchronous use.

Param data: Pointer to the audio data buffer, valid only during this callback
Param len: Length of the audio data in bytes
Param ctx: User-defined context passed to the callback

typedef void (*esp_xiaozhi_chat_event_callback_t)(esp_xiaozhi_chat_event_t event, void *event_data, void *ctx)

Callback for receiving chat events.

Param event: Chat event type
Param event_data: Optional output data associated with the event
Param ctx: User-defined context passed to the callback

Enumerations

enum esp_xiaozhi_chat_tts_state_kind_t

TTS state kind for protocol-layer notification (app decides device state)

Values:

enumerator ESP_XIAOZHI_CHAT_TTS_STATE_START: TTS playback started

enumerator ESP_XIAOZHI_CHAT_TTS_STATE_STOP: TTS playback stopped

enumerator ESP_XIAOZHI_CHAT_TTS_STATE_SENTENCE_START: TTS sentence started; text is valid

enum esp_xiaozhi_chat_event_t

Events that can occur during a Xiaozhi chat session (minimal protocol API)

Component only reports protocol facts; app handles state machine, UI, and system commands.

Values:

enumerator ESP_XIAOZHI_CHAT_EVENT_CHAT_SPEECH_STARTED: Emitted on TTS start; prefer CHAT_TTS_STATE for new code

enumerator ESP_XIAOZHI_CHAT_EVENT_CHAT_SPEECH_STOPPED: Emitted on TTS stop; prefer CHAT_TTS_STATE for new code

enumerator ESP_XIAOZHI_CHAT_EVENT_CHAT_ERROR: event_data = esp_xiaozhi_chat_error_info_t *

enumerator ESP_XIAOZHI_CHAT_EVENT_CHAT_TEXT: event_data = esp_xiaozhi_chat_text_data_t * (STT/TTS sentence)

enumerator ESP_XIAOZHI_CHAT_EVENT_CHAT_EMOJI: event_data = const char * (LLM emotion)

enumerator ESP_XIAOZHI_CHAT_EVENT_CHAT_TTS_STATE: event_data = esp_xiaozhi_chat_tts_state_t * (protocol TTS state)

enumerator ESP_XIAOZHI_CHAT_EVENT_CHAT_SYSTEM_CMD: event_data = const char * (e.g. “reboot”); app decides whether to execute

enum esp_xiaozhi_chat_audio_type_t

Supported audio formats for Xiaozhi chat.

Values:

enumerator ESP_XIAOZHI_CHAT_AUDIO_TYPE_OPUS: OPUS compressed audio format

enum esp_xiaozhi_chat_device_state_t

Device state for Xiaozhi chat.

Values:

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_UNKNOWN

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_STARTING

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_WIFI_CONFIGURING

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_IDLE

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_CONNECTING

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_LISTENING

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_SPEAKING

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_UPGRADING

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_ACTIVATING

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_AUDIO_TESTING

enumerator ESP_XIAOZHI_CHAT_DEVICE_STATE_FATAL_ERROR

enum esp_xiaozhi_chat_listening_mode_t

Listening mode for Xiaozhi chat.

Values:

enumerator ESP_XIAOZHI_CHAT_LISTENING_MODE_REALTIME

enumerator ESP_XIAOZHI_CHAT_LISTENING_MODE_AUTO

enumerator ESP_XIAOZHI_CHAT_LISTENING_MODE_MANUAL

enumerator ESP_XIAOZHI_CHAT_LISTENING_MODE_AUTO_STOP

enumerator ESP_XIAOZHI_CHAT_LISTENING_MODE_MANUAL_STOP

enumerator ESP_XIAOZHI_CHAT_LISTENING_MODE_UNKNOWN

enum esp_xiaozhi_chat_abort_speaking_reason_t

Reasons for aborting speaking.

Values:

enumerator ESP_XIAOZHI_CHAT_ABORT_SPEAKING_REASON_WAKE_WORD_DETECTED

enumerator ESP_XIAOZHI_CHAT_ABORT_SPEAKING_REASON_STOP_LISTENING

enumerator ESP_XIAOZHI_CHAT_ABORT_SPEAKING_REASON_UNKNOWN

enum esp_xiaozhi_chat_text_role_t

Text role enumeration for chat messages.

Values:

enumerator ESP_XIAOZHI_CHAT_TEXT_ROLE_USER: User message role

enumerator ESP_XIAOZHI_CHAT_TEXT_ROLE_ASSISTANT: Assistant message role

Device information

Header File

components/esp_xiaozhi/include/esp_xiaozhi_info.h

Functions

esp_err_t esp_xiaozhi_chat_get_info(esp_xiaozhi_chat_info_t *info)

Get Xiaozhi Chat Information from the HTTP server.

The function posts board information to the configured service endpoint, parses the response, updates the output structure, and persists MQTT/WebSocket settings to NVS when present in the server response.

Parameters

info – [inout] Pointer to the information structure

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid info pointer
ESP_ERR_NO_MEM Failed to allocate working buffers or HTTP resources
ESP_ERR_INVALID_RESPONSE Server response is malformed or missing a valid body
Other Error from board info collection, HTTP client, JSON parsing, or keystore persistence

esp_err_t esp_xiaozhi_chat_free_info(esp_xiaozhi_chat_info_t *info)

Free Xiaozhi Chat Information.

Releases dynamically allocated string fields owned by info. This function does not zero the full structure or reset the boolean flags.

Parameters

info – [inout] Pointer to the information structure

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid info pointer

Structures

struct esp_xiaozhi_chat_info_t

Information for Xiaozhi chat.

Public Members

char *current_version: Current version of the firmware

char *firmware_version: Firmware version

char *firmware_url: Firmware URL

char *serial_number: Serial number

char *activation_code: Activation code

char *activation_challenge: Activation challenge

char *activation_message: Activation message

int activation_timeout_ms: Activation timeout in milliseconds

bool has_serial_number: Has serial number

bool has_new_version: Has new version

bool has_activation_code: Has activation code

bool has_activation_challenge: Has activation challenge

bool has_mqtt_config: Has MQTT config

bool has_websocket_config: Has WebSocket config

bool has_server_time: Has server time

NVS operations

Optional hooks to redirect keystore NVS access (e.g. run NVS on an SRAM task). See esp_xiaozhi_nvs_ops.h.

Header File

components/esp_xiaozhi/include/esp_xiaozhi_nvs_ops.h

Functions

void esp_xiaozhi_nvs_ops_register(const esp_xiaozhi_nvs_ops_t *ops)

Register NVS operations (e.g. to delegate to NVS service on SRAM task). Pass NULL to use default direct NVS.

Parameters: ops – [in] Pointer to ops struct, or NULL for default

Structures

struct esp_xiaozhi_nvs_ops_s

NVS operations callbacks for keystore. When set via esp_xiaozhi_nvs_ops_register(), all NVS access goes through these callbacks (e.g. to run NVS on an SRAM task and avoid PSRAM stack issues). NULL means use default direct nvs_* calls.

Public Members

esp_err_t (*nvs_open)(const char *name_space, bool read_write, nvs_ops_handle_t *out_handle): Same semantics as IDF nvs_open()

void (*nvs_close)(nvs_ops_handle_t handle): Same semantics as IDF nvs_close()

esp_err_t (*nvs_commit)(nvs_ops_handle_t handle): Same semantics as IDF nvs_commit()

esp_err_t (*nvs_get_str)(nvs_ops_handle_t handle, const char *key, char *out_value, size_t *length): Same semantics as IDF nvs_get_str()

esp_err_t (*nvs_set_str)(nvs_ops_handle_t handle, const char *key, const char *value): Same semantics as IDF nvs_set_str()

esp_err_t (*nvs_get_i32)(nvs_ops_handle_t handle, const char *key, int32_t *out_value): Same semantics as IDF nvs_get_i32()

esp_err_t (*nvs_set_i32)(nvs_ops_handle_t handle, const char *key, int32_t value): Same semantics as IDF nvs_set_i32()

esp_err_t (*nvs_erase_key)(nvs_ops_handle_t handle, const char *key): Same semantics as IDF nvs_erase_key()

esp_err_t (*nvs_erase_all)(nvs_ops_handle_t handle): Same semantics as IDF nvs_erase_all()

Type Definitions

typedef uint32_t nvs_ops_handle_t: Opaque NVS handle passed through registered NVS ops (same width as IDF nvs_handle_t).

typedef struct esp_xiaozhi_nvs_ops_s esp_xiaozhi_nvs_ops_t: NVS operations callbacks for keystore. When set via esp_xiaozhi_nvs_ops_register(), all NVS access goes through these callbacks (e.g. to run NVS on an SRAM task and avoid PSRAM stack issues). NULL means use default direct nvs_* calls.

Camera explain

Header File

components/esp_xiaozhi/include/esp_xiaozhi_camera.h

Functions

esp_err_t esp_xiaozhi_camera_create(const esp_xiaozhi_camera_config_t *config, esp_xiaozhi_camera_handle_t **out_handle)

Create a camera explain client handle.

This API does not initialize or control the camera. The application is responsible for capturing frames and passing them to esp_xiaozhi_camera_explain().

Parameters

config – [in] Pointer to the explain client configuration, can be NULL
out_handle – [out] Pointer to the output handle

Returns

ESP_OK On success
ESP_ERR_NO_MEM Out of memory
ESP_ERR_INVALID_ARG Invalid output handle

esp_err_t esp_xiaozhi_camera_destroy(esp_xiaozhi_camera_handle_t *handle)

Destroy the camera explain client handle.

Parameters

handle – [in] Handle to the explain client

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle

esp_err_t esp_xiaozhi_camera_set_explain_url(esp_xiaozhi_camera_handle_t *handle, const char *url, const char *token)

Configure the vision explain endpoint and optional bearer token.

Parameters

handle – [in] Handle to the explain client
url – [in] Explain service URL, or NULL to clear it
token – [in] Optional bearer token, or NULL to clear it

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle

esp_err_t esp_xiaozhi_camera_explain(esp_xiaozhi_camera_handle_t *handle, const esp_xiaozhi_camera_frame_t *frame, const char *question, char *out_buf, size_t buf_size, size_t *out_len)

Upload an application-provided frame to the explain service as JPEG.

Parameters

handle – [in] Handle to the explain client
frame – [in] JPEG image prepared by the application
question – [in] Multipart “question” field sent to the server
out_buf – [out] Optional response buffer
buf_size – [in] Size of out_buf in bytes
out_len – [out] Optional response length written to out_buf

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid argument
ESP_ERR_INVALID_STATE Explain URL is not configured
ESP_ERR_NO_MEM Out of memory
Other Error from HTTP upload

Structures

struct esp_xiaozhi_camera_frame_t

JPEG image provided by the application.

Public Members

const uint8_t *data: JPEG buffer provided by the application

size_t len: JPEG length in bytes

struct esp_xiaozhi_camera_config_t

Configuration for the camera explain client.

Public Members

const char *explain_url: Explain service URL

const char *explain_token: Optional bearer token

Type Definitions

typedef struct esp_xiaozhi_camera_ctx esp_xiaozhi_camera_handle_t: Handle for the camera explain client.

Video explain

Header File

components/esp_xiaozhi/include/esp_xiaozhi_video.h

Functions

esp_err_t esp_xiaozhi_video_create(const esp_xiaozhi_video_config_t *config, esp_xiaozhi_video_handle_t **out_handle)

Create a video explain client handle.

Parameters

config – [in] Pointer to the explain client configuration, can be NULL
out_handle – [out] Pointer to the output handle

Returns

ESP_OK On success
ESP_ERR_NO_MEM Out of memory
ESP_ERR_INVALID_ARG Invalid output handle

esp_err_t esp_xiaozhi_video_destroy(esp_xiaozhi_video_handle_t *handle)

Destroy the video explain client handle.

Parameters

handle – [in] Handle to the explain client

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle

esp_err_t esp_xiaozhi_video_set_explain_url(esp_xiaozhi_video_handle_t *handle, const char *url, const char *token)

Configure the vision explain endpoint and optional bearer token.

Parameters

handle – [in] Handle to the explain client
url – [in] Explain service URL, or NULL to clear it
token – [in] Optional bearer token, or NULL to clear it

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid handle

esp_err_t esp_xiaozhi_video_explain(esp_xiaozhi_video_handle_t *handle, const esp_xiaozhi_video_frame_t *frame, const char *question, char *out_buf, size_t buf_size, size_t *out_len)

Upload an application-provided frame to the explain service as JPEG.

Parameters

handle – [in] Handle to the explain client
frame – [in] JPEG image prepared by the application
question – [in] Multipart “question” field sent to the server
out_buf – [out] Optional response buffer
buf_size – [in] Size of out_buf in bytes
out_len – [out] Optional response length written to out_buf

Returns

ESP_OK On success
ESP_ERR_INVALID_ARG Invalid argument
ESP_ERR_INVALID_STATE Explain URL is not configured
ESP_ERR_NO_MEM Out of memory
Other Error from HTTP upload

Structures

struct esp_xiaozhi_video_frame_t

JPEG image provided by the application.

Public Members

const uint8_t *data: JPEG buffer provided by the application

size_t len: JPEG length in bytes

struct esp_xiaozhi_video_config_t

Configuration for the video explain client.

Public Members

const char *explain_url: Explain service URL

const char *explain_token: Optional bearer token

Type Definitions

typedef struct esp_xiaozhi_video_ctx esp_xiaozhi_video_handle_t: Handle for the video explain client.

Provide feedback about this document