ESP-Brookesia Programming Guide

[中文]

ESP-Brookesia Logo

Getting Started

Utils Components

HAL Components

Getting Started

Utils Components

HAL Components

Service Components

AI Agent Components

AI Expression Components

Service Components

AI Agent Components

AI Expression Components

Overview

ESP-Brookesia is a human-machine interaction development framework for AIoT devices. It is designed to simplify application development and AI capability integration. Built on ESP-IDF and a component-based architecture, it provides full-stack support from hardware abstraction and system services to AI agents, helping developers accelerate the development and time-to-market of HMI and AI products.

Note

“Brookesia” is a genus of chameleons known for camouflage and adaptability, which closely aligns with the goals of ESP-Brookesia. The framework is designed to provide a flexible and scalable solution that can adapt to different hardware devices and application requirements, much like the Brookesia chameleon itself.

The main features of ESP-Brookesia include:

  • Native ESP-IDF support: Developed in C/C++ with deep integration into the ESP-IDF development workflow and the ESP Registry component catalog, fully leveraging Espressif’s open-source component ecosystem.

  • Extensible hardware abstraction: Defines unified hardware interfaces for audio, display, touch, storage, and more, and provides a board adaptation layer for fast porting across hardware platforms.

  • Rich system services: Offers ready-to-use system-level services such as Wi-Fi connectivity and audio/video processing. A Manager + Helper architecture is used for decoupling and extension, and also provides support for Agent CLI.

  • Multi-LLM backend integration: Includes built-in adapters for mainstream AI platforms such as OpenAI, Coze, and XiaoZhi, with unified agent management and lifecycle control.

  • MCP protocol support: Exposes device service capabilities to large language models through Function Calling / MCP, enabling unified communication between LLMs and system services.

  • AI expression capabilities: Supports emote sets, animation sets, and other visual AI expressions to provide rich visual feedback for anthropomorphic interaction.

Functional Architecture

ESP-Brookesia adopts a layered architecture. From bottom to top, it consists of Environment & Dependencies, Service & Framework, and Application Layer, as shown below:

ESP-Brookesia framework overview

Environment & Dependencies

The runtime foundation of the framework. ESP-IDF provides the build toolchain, real-time operating system, and peripheral drivers, while ESP Registry centrally manages the distribution and version evolution of framework components and their third-party dependencies.

Service & Framework

The core layer of the framework. It connects downward to the environment and dependencies and provides standardized service interfaces upward to applications and AI agents, covering utilities, hardware abstraction, system services, AI agents, and expression modules.

  • Utils: Provides common foundational capabilities for upper-layer modules. General Utils includes the logging system, error checking, state machine, task scheduler, plugin manager, and memory/thread/time profilers. MCP Utils acts as the bridge between ESP-Brookesia services and the MCP engine, exposing registered service functions as standard MCP tools so large language models can call device capabilities.

  • HAL: Defines unified hardware access interfaces and provides board-level adaptation. Interface defines standardized hardware APIs for audio playback/recording, display panels and touch, status LEDs, and storage file systems. Adaptor provides implementations for specific development boards and completes hardware resource initialization and mapping. Boards provides board-level YAML configuration that describes the peripheral topology, pin assignments, and driver parameters of each board.

  • General Service: Provides system-level foundational services, including Wi-Fi connection management, Audio capture and playback, Video codec processing, NVS non-volatile storage, SNTP network time synchronization, and a Custom service extension mechanism. All services use the Manager + Helper architecture and support both local calls and RPC-based remote communication.

  • AI Agent Framework: Provides a unified management framework for AI agents, with built-in adapters for mainstream AI platforms such as Coze, OpenAI, and XiaoZhi. Through the Function Calling / MCP protocol, it enables bidirectional communication between large language models and system services, allowing LLMs to perceive and invoke device capabilities.

  • AI Expression: Provides visual expression capabilities for AI interaction scenarios, including Emote sets and animation control, delivering rich visual feedback for anthropomorphic interaction.

  • System (planned): Provides GUI, system management, and application framework support for different product forms such as mobile devices, speakers, and robots.

  • Runtime (planned): Provides runtime support for WebAssembly, Python, Lua, and more, enabling dynamic application loading and execution.

Application Layer

The final products and projects built on top of the layers above:

  • General Projects: General project templates for product development that integrate framework components and can be used directly as the basis for product development.

  • System Apps (planned): A collection of system-level applications for products, including settings, AI assistants, app stores, and more, which can be selectively integrated as needed.