Espressif General Multimedia Framework Guide
Overview
ESP-GMF (Espressif General Multimedia Framework) is a lightweight general-purpose software framework built by Espressif for IoT multimedia applications. The framework uses modular elements as its basic building block, decomposing streaming data processing into independently developable and freely composable work units. At runtime, a pipeline chains these elements together while a task schedules their execution. The entire GMF-Core requires approximately 7 KB of runtime RAM on IoT chips. Beyond audio, the framework also handles image, video, and arbitrary streaming data processing.
Key Features
Lightweight for IoT: Designed for resource-constrained ESP32 series chips; GMF-Core runtime RAM footprint is approximately 7 KB
Multi-domain coverage: Unified handling of audio, video, image, and generic streaming data within a single framework
Modular composition: Assemble functionality using elements as needed, or extend with custom elements and IO components
Easy development: Scales from simple playback pipelines to complex scenarios such as multi-channel mixing, AI speech, and video compositing
Rich components: Built-in advanced application components including audio codecs, video codecs, and AI front-end processing
Ecosystem friendly: Built on ESP-IDF and the Espressif Component Registry, compatible with the existing component ecosystem
System Modules
ESP-GMF consists of four main modules, listed bottom-up by dependency level:
GMF-Core: The framework foundation, providing pipeline management, task scheduling, data flow control, and other infrastructure. Contains basic objects including element, pipeline, task, data bus, and pool. Most applications use GMF-Core indirectly through higher-level components; direct GMF-Core programming is needed when extending the framework or writing custom elements
Elements: Mid-layer components that implement specific functionality on top of GMF-Core, including the following sub-modules:
gmf_audio: Audio encoding, decoding, and effectsgmf_video: Video encoding, decoding, and image effectsgmf_io: File / network / flash / codec device IOgmf_ai_audio: AI front-end processing algorithms including wake word, command word, and AECgmf_misc: Miscellaneous utilities
Packages: High-level encapsulations targeting specific application scenarios, assembling multiple elements into common multimedia workflows, including the following sub-modules:
esp_audio_simple_player: Simple audio playeresp_capture: Multimedia captureesp_audio_render: Multi-channel mixing rendereresp_video_render: Video compositing and display / multi-channel video / UI compositingesp_bt_audio: Classic Bluetooth and LE Audioesp_asrc: Audio sample rate / bit depth / channel conversionesp_board_manager: Board-level managementgmf_loader: GMF loadergmf_app_utils: Application utility toolsgmf_fft: FFT computation component
GMF-Examples: Demonstration projects showing how to use ESP-GMF, covering audio and video playback and recording, Bluetooth audio, AI Agent, and more
Relationship with ESP-ADF
ESP-GMF evolved from the audio_pipeline module in ESP-ADF (Espressif Audio Development Framework), extracting the pipeline architecture and extending it to video, image, and generic streaming data. The distinction between the two:
ESP-ADF is a feature-oriented repository for multimedia applications; its
masterbranch serves as the continuing mainline, providing components based on ESP-GMF and offering product-level solution examples for customersESP-GMF provides the pipeline architecture, with flexible support for audio, video, image, and arbitrary streaming data


