GMF-AI-Audio
gmf_ai_audio is the AI voice front-end component of ESP-GMF, wrapping the esp-sr speech algorithm library (wake word, command word, AEC, NS, VAD, DOA) into elements that can be connected to a pipeline. The component provides six elements (ai_afe / ai_aec / ai_wn / ai_ns / ai_vad / ai_doa) and an internal manager (esp_gmf_afe_manager). Among them, ai_afe is the comprehensive interface encapsulating full voice front-end capabilities, suitable for direct use as a unified entry point; ai_aec / ai_wn / ai_ns / ai_vad / ai_doa are for individual capabilities. All six elements follow the unified element interface and can be chained and combined in any order in the same pipeline. This document covers the principles, configuration, and event system in the order of manager, comprehensive element, and standalone algorithm elements; for the element base class and runtime method mechanism, see GMF Elements; for the data path, see Data Flow.
Feature List
ai_afe: full voice front-end element; after connecting to codec_dev IO, outputs 16-bit mono PCM and reports wakeup, VAD, and command word events via event callbacks
ai_aec: standalone echo cancellation element; performs AEC on input PCM and outputs 16-bit mono PCM; suitable for scenarios requiring only echo-cancelled microphone signal without wakeup / VAD / command word
ai_wn: standalone wake word detection element; does not create feed / fetch tasks; synchronously detects in the element
processand transparently passes the input PCM to the output portai_ns: standalone noise suppression element; input format is 16 kHz, 16-bit mono PCM; supports NSNet2 model or WebRTC NS backend
ai_vad: standalone voice activity detection element; supports WebRTC VAD and VADNet backends; reports VAD state via callback and can pass through input PCM
ai_doa: standalone direction of arrival estimation element; requires input format with two microphone channels; outputs angle estimation results via callback
esp_gmf_afe_manager: internal manager that encapsulates feed / fetch two tasks, responsible for
esp-srAFE data input, result retrieval, feature toggling, and pause / resumeNS / VAD / SE: ai_afe uses
esp-sr’s noise suppression (NS), voice activity detection (VAD), and speech enhancement (SE) capabilities via the AFE manager; whether enabled is controlled byafe_config_tand runtime feature switchesChannel format convention: the input channel arrangement is described by a string;
Mfor microphone,Rfor speaker reference,Nfor unused channel; e.g.,MMNRmeans the first two channels are microphones, plus one unused and one reference channelWakeup and VAD state machine: supports three combinations - “wakeup only”, “VAD only”, “wakeup + VAD”; automatically maintains IDLE / WAKEUP / SPEECHING / WAIT_FOR_SLEEP states
Command word detection (VCMD): based on
MultiNet; independent of the wakeup state machine; started by the application callingesp_gmf_afe_vcmd_detection_beginManual wakeup control:
esp_gmf_afe_keep_awakekeeps awake state;esp_gmf_afe_trigger_wakeup/..._trigger_sleepswitches manually; suitable for button wakeup and other non-voice triggersEvent system: 6 event types covering wakeup start / end, VAD start / end, command word detection, and command word timeout
Technical Details
Component Hierarchy
The relationship between the six elements and the manager is shown below. ai_afe is the upper-level wrapper of the manager, adapting manager callbacks to the GMF element interface; ai_aec, ai_wn, ai_ns, ai_vad, and ai_doa call the corresponding esp-sr algorithms directly, without going through the manager.
classDiagram
direction TB
class esp_gmf_audio_element_t
class ai_afe {
feed/fetch task
wakeup state machine
command word detection
event callback
}
class ai_aec {
standalone AEC
reference + microphone channels
}
class ai_wn {
standalone WakeNet
wake word detection
}
class ai_ns {
standalone NS
single-channel noise suppression
}
class ai_vad {
standalone VAD
state callback
}
class ai_doa {
standalone DOA
sound source localization
}
class esp_gmf_afe_manager {
feed_task
fetch_task
feature control
}
esp_gmf_audio_element_t <|-- ai_afe
esp_gmf_audio_element_t <|-- ai_aec
esp_gmf_audio_element_t <|-- ai_wn
esp_gmf_audio_element_t <|-- ai_ns
esp_gmf_audio_element_t <|-- ai_vad
esp_gmf_audio_element_t <|-- ai_doa
ai_afe ..> esp_gmf_afe_manager : uses
When element behavior is needed, choose ai_afe / ai_aec / ai_wn / ai_ns / ai_vad / ai_doa to connect to the pipeline based on the scenario; when bypassing the GMF framework to use esp-sr directly, esp_gmf_afe_manager can be used standalone.
input_format Channel String
Elements using multi-channel input use the input_format string to describe the role of each channel: M for microphone capture, R for speaker reference (used as AEC reference), N for unused channel. For detailed rules of the channel string, see esp-sr AFE Input Channel Description.
For example, "MMNR" means the input is 4-channel interleaved PCM, channels 1/2 are microphones, channel 3 is unused, and channel 4 is reference. The component automatically extracts the required channels from the input PCM and feeds them to the underlying algorithm. The ai_afe input sample rate is fixed at 16 kHz, 16-bit; ai_aec additionally supports 8 kHz when using AFE_TYPE_VC_8K; ai_doa requires exactly two M channels in the input format.
AFE Manager
esp_gmf_afe_manager wraps the esp-sr AFE interface into a callback model of “data input → algorithm processing → result output”, automatically creating feed and fetch tasks.
flowchart LR
ReadCb[("read_cb<br/>(provided by app)")] --> Feed["feed_task"]
Feed --> Core["AFE processing module<br/>esp-sr"]
Core --> Fetch["fetch_task"]
Fetch --> ResultCb[("result_cb<br/>(received by app)")]
The application provides read_cb and result_cb via esp_gmf_afe_manager_cfg_t; feed_task periodically calls read_cb to get one frame of multi-channel PCM and feeds it to the esp-sr AFE; fetch_task retrieves the processed result (noise-suppressed / AEC-processed mono PCM + wakeup / VAD / command word events) and calls result_cb. The two tasks default to different cores (core 0 / core 1), stack 3 KiB, priority 5; DEFAULT_GMF_AFE_MANAGER_CFG provides default values.
Individual features can be toggled at runtime:
esp_gmf_afe_manager_enable_features(mgr, ESP_AFE_FEATURE_AEC, true);
esp_gmf_afe_manager_enable_features(mgr, ESP_AFE_FEATURE_VAD, false);
After calling esp_gmf_afe_manager_suspend() with the suspend flag set to true, both feed and fetch tasks can be suspended simultaneously, suitable for low-power scenarios. After initialization, esp_gmf_afe_manager_get_chunk_size() and get_input_ch_num can be used to query the number of samples per processing chunk and the total number of input channels, helping the application adjust its IO buffer.
ai_afe: Full Voice Front-End
ai_afe wraps esp_gmf_afe_manager into an element that can be connected to a pipeline: feeds multi-channel PCM read from codec_dev IO into the manager, writes the mono PCM output by fetch to the output port, and converts wakeup / VAD / command word into event callbacks to send to the application.
The output of ai_afe is 16-bit mono PCM. Depending on the esp-sr AFE configuration, the output audio may incorporate AEC, NS, SE, AGC processing results; wakeup, VAD, and command word detection results are reported via event callbacks and are not placed in audio payloads.
Configuration. esp_gmf_afe_cfg_t requires at least afe_manager, models (loaded esp-sr models), and event_cb. The underlying AFE type is determined by the type parameter of afe_config_init; common types in the latest esp-sr include AFE_TYPE_SR (speech recognition), AFE_TYPE_VC / AFE_TYPE_VC_8K (voice communication), and AFE_TYPE_FD (full duplex, suitable for voice interaction scenarios with simultaneous playback and capture):
afe_config_t *afe_cfg = afe_config_init("MMNR", models, AFE_TYPE_FD, AFE_MODE_LOW_COST);
afe_cfg->wakenet_init = true;
afe_cfg->vad_init = true;
afe_cfg->aec_init = true;
esp_gmf_afe_manager_cfg_t mgr_cfg = DEFAULT_GMF_AFE_MANAGER_CFG(afe_cfg,
my_read_cb, &io_ctx, NULL, NULL);
esp_gmf_afe_manager_handle_t mgr = NULL;
esp_gmf_afe_manager_create(&mgr_cfg, &mgr);
esp_gmf_afe_cfg_t afe_el_cfg = DEFAULT_GMF_AFE_CFG(mgr, my_event_cb, &app_ctx, models);
afe_el_cfg.vcmd_detect_en = true;
esp_gmf_element_handle_t ai_afe = NULL;
esp_gmf_afe_init(&afe_el_cfg, &ai_afe);
Four timing parameters (default values are given in esp_gmf_afe.h) control state machine behavior:
Parameter |
Default |
Description |
|---|---|---|
|
10000 ms |
How long after wakeup without any VAD event before |
|
2000 ms |
How long after VAD ends with silence before |
|
5760 ms |
Command word detection timeout; begin must be called again after timeout |
|
2048 samples |
Output PCM delay to compensate for VAD detection lag; converted to time, should be greater than |
Wakeup and VAD State Machine. Three combinations switch automatically. Wakeup only:
stateDiagram-v2
[*] --> IDLE
IDLE --> WAKEUP : wake word / WAKEUP_START
WAKEUP --> IDLE : wakeup_time timeout / WAKEUP_END
VAD only:
stateDiagram-v2
[*] --> IDLE
IDLE --> SPEECHING : voice detected / VAD_START
SPEECHING --> IDLE : silence / VAD_END
Wakeup + VAD combined:
stateDiagram-v2
[*] --> IDLE
IDLE --> WAKEUP : wake word / WAKEUP_START
WAKEUP --> SPEECHING : voice / VAD_START
WAKEUP --> IDLE : wakeup_time timeout / WAKEUP_END
SPEECHING --> WAIT_FOR_SLEEP : silence / VAD_END
WAIT_FOR_SLEEP --> SPEECHING : voice / VAD_START
WAIT_FOR_SLEEP --> IDLE : wakeup_end timeout / WAKEUP_END
The combined mode is suitable for the interaction flow of “speak wake word → voice input → return to standby after silence”, avoiding frequent VAD events being triggered outside the wakeup interval.
Manual wakeup control. In addition to the automatic state machine, three APIs provide non-voice triggering:
esp_gmf_afe_trigger_wakeup(): simulates a wake word hit, immediately entering WAKEUP state and broadcastingWAKEUP_START; used for button wakeup and external sensor triggersesp_gmf_afe_trigger_sleep(): manually switches to IDLEesp_gmf_afe_keep_awake(): once enabled, disables the automatic sleep timer (wakeup_timeandwakeup_end). After setting, the element will not automatically exit WAKEUP due to timeout;esp_gmf_afe_trigger_sleep()must be called to return to IDLE
Command word detection (VCMD): independent of the wakeup state machine. The typical flow is to call esp_gmf_afe_vcmd_detection_begin() after receiving WAKEUP_START; upon detection, the event callback provides esp_gmf_afe_vcmd_info_t (containing phrase_id, prob, and the command string); call begin again after detection or timeout to continue. vcmd_detection_cancel clears the current detection state while preserving the feature enable flag, so begin can be called again later.
Event System
ai_afe reports six event types via esp_gmf_afe_event_cb_t; event enum values can be positive or negative, allowing command word IDs to be placed directly in the enum:
Event |
Value |
Payload |
|---|---|---|
|
-100 |
|
|
-99 |
NULL |
|
-98 |
NULL |
|
-97 |
NULL |
|
-96 |
NULL |
|
|
|
The callback executes in the fetch_task context; the application is recommended to only do lightweight dispatching (update state, enqueue message); time-consuming logic should run in the main thread or an independent task.
ai_aec: Standalone Echo Cancellation
ai_aec only performs echo cancellation: extracts microphone + reference channels from multi-channel input PCM according to input_format, processes them through the AEC algorithm, and outputs mono PCM. No model partition is required; it consumes fewer resources than ai_afe and is suitable for recording pipelines that only need echo cancellation without wakeup / VAD / command word.
Three tuning fields in esp_gmf_aec_cfg_t:
filter_len: filter length; recommended 4 for ESP32-S3 / P4, 2 for ESP32-C5; higher values consume more CPUtype:AFE_TYPE_VC(voice communication) orAFE_TYPE_SR(speech recognition)mode:AFE_MODE_LOW_POWERorAFE_MODE_HIGH_PERF
esp_gmf_aec_cfg_t cfg = {
.filter_len = 4,
.type = AFE_TYPE_SR,
.mode = AFE_MODE_HIGH_PERF,
.input_format = "MMNR",
};
esp_gmf_obj_handle_t aec = NULL;
esp_gmf_aec_init(&cfg, &aec);
ai_aec internally maintains a synchronization buffer for reference and microphone signals: each process accumulates one frame of aligned data before calling the underlying AEC, outputting 16-bit mono PCM. The input sample rate is typically 16 kHz; when configured with AFE_TYPE_VC_8K, the input sample rate is 8 kHz. The bit depth must be 16-bit PCM; any mismatch is rejected at the open stage.
ai_wn: Standalone Wake Word Detection
ai_wn is a lightweight wrapper of WakeNet: process synchronously runs detection on the input PCM, calls the user’s detect_cb on a hit and passes the current frame through to the output port; on a miss, it also passes through, leaving the downstream to decide how to handle it.
Differences from ai_afe:
Does not create feed / fetch tasks; processing occurs directly in the GMF task context
Does not depend on the AFE manager or the full model set; only loads the WakeNet model
Lower resource usage; suitable for memory-constrained scenarios or those requiring only wake word detection
esp_gmf_wn_cfg_t cfg = {
.models = models,
.det_mode = DET_MODE_2CH_90,
.input_format = "MMNR",
.detect_cb = my_wakeup_cb,
.user_ctx = &ctx,
};
esp_gmf_element_handle_t wn = NULL;
esp_gmf_wn_init(&cfg, &wn);
Supports sample rates of 8 kHz or 16 kHz, 16-bit PCM. The number of channels is determined by det_mode when WakeNet is initialized (e.g., DET_MODE_90); the number of M channels in input_format must match; otherwise the model refuses to run.
ai_ns: Standalone Noise Suppression
ai_ns performs noise suppression on single-channel PCM and outputs PCM in the same format. It is suitable for recording or voice pre-processing pipelines that only need noise suppression without the full AFE state machine.
Main fields of esp_gmf_ns_cfg_t:
sample_rate: sample rate; currently supports 16 kHzchannel: channel count; currently only mono is supportedframe_ms: WebRTC NS frame duration; supports 10 / 20 / 30 msmodel_nameandpartition_label: NSNet2 model name and model partition label
esp_gmf_ns_cfg_t cfg = ESP_GMF_NS_CFG_DEFAULT();
esp_gmf_obj_handle_t ns = NULL;
esp_gmf_ns_init(&cfg, &ns);
When CONFIG_SR_NSN_NSNET2 is enabled, ai_ns loads the NSNet2 model from the partition specified by partition_label; when the WebRTC NS backend is enabled, model-related fields are not used.
ai_vad: Standalone Voice Activity Detection
ai_vad performs voice activity detection on single-channel PCM and reports via callback when the VAD state changes. The element can copy the input PCM to the output port for subsequent pipeline consumption of the original audio.
Main fields of esp_gmf_vad_cfg_t:
sample_rate: sample rate; WebRTC VAD supports 8 kHz / 16 kHz / 32 kHzframe_ms: WebRTC VAD frame duration; supports 10 / 20 / 30 msvad_mode: VAD sensitivity moderesult_callback: state change callback, returns the underlyingvad_state_tmodel_nameandpartition_label: VADNet model name and model partition label
static void vad_cb(vad_state_t state, void *ctx)
{
/* Update application logic based on VAD state */
}
esp_gmf_vad_cfg_t cfg = ESP_GMF_VAD_CFG_DEFAULT();
cfg.result_callback = vad_cb;
esp_gmf_obj_handle_t vad = NULL;
esp_gmf_vad_init(&cfg, &vad);
When the VADNet backend is selected, the element loads the VADNet model from the model partition and uses the frame length required by the model; when the WebRTC backend is selected, frame_ms controls the processing duration per call.
ai_doa: Standalone Direction of Arrival Estimation
ai_doa estimates the direction of the sound source based on two microphone signals; the processing result is returned as an angle value via callback without outputting new PCM data. It is suitable for applications where a microphone array needs to sense the direction of the sound source.
Main fields of esp_gmf_doa_cfg_t:
sample_rate: sample rate; default 16 kHzresolution: direction estimation resolutiond_mics: physical distance between the two microphones in metersframe_ms: audio duration required to produce one DOA result; default 64 msinput_format: input channel arrangement; must contain exactly twoMchannelsresult_callback: direction estimation result callback
static void doa_cb(float angle, void *ctx)
{
/* angle is the direction of arrival estimation result */
}
esp_gmf_doa_cfg_t cfg = ESP_GMF_DOA_CFG_DEFAULT();
cfg.result_callback = doa_cb;
esp_gmf_obj_handle_t doa = NULL;
esp_gmf_doa_init(&cfg, &doa);
Performance
The bottleneck of AI Audio elements is concentrated in the underlying esp-sr algorithm; the GMF layer overhead is mainly acquire-release and callback dispatch. Optimization recommendations:
Module |
Main Bottleneck |
Optimization Direction |
|---|---|---|
ai_afe |
CPU when wake model + AEC + NS run simultaneously |
Assign feed / fetch to different cores (default 0 / 1); use |
ai_aec |
Filter length |
Use |
ai_wn |
WakeNet model inference |
Choose 1-channel version ( |
ai_ns |
NS model or WebRTC NS computation |
Use mono input; choose NSNet2 or WebRTC backend based on actual noise conditions |
ai_vad |
VAD model or WebRTC VAD computation |
Use shorter frame length for WebRTC backend to reduce latency; ensure correct model partition for VADNet backend |
ai_doa |
DOA algorithm and dual-microphone channel extraction |
Reduce |
AFE Manager |
feed / fetch queue length and ringbuffer size |
Watch |
Application Examples
elements/gmf_ai_audio/examples/wwe: Complete wake word detection project, covering ai_afe + manager creation, event callback handling, and command word triggeringelements/gmf_ai_audio/examples/aec_rec: AEC recording project, demonstrating ai_aec connected to a pipeline and outputting echo-cancelled PCMelements/gmf_ai_audio/examples/wwe/README_CN.mdandelements/gmf_ai_audio/examples/aec_rec/README_CN.md: Board wiring, Kconfig options, and run instructions for each project
Use idf.py create-project-from-example "espressif/gmf_ai_audio=<version>:wwe" to generate a compilable project directly based on this component.
Debugging Tools
ESP Audio Analyzer is Espressif’s audio testing solution, combining a device-side test project with a web-based analysis interface. Over a WebSocket connection, it runs standardized tests on microphones, speakers, AEC, and related capabilities, and outputs metrics such as THD and SNR along with structured test reports. After the device joins the network, connect from the web page to start testing.
The test project is built on gmf_ai_audio: the recording pipeline uses ai_afe with AEC enabled in the AFE by default. When tuning AEC performance, you can verify echo cancellation in full-duplex play-and-record scenarios without manually capturing PCM or writing playback scripts. The web UI adjusts MIC gain, playback volume, and channel format (M / R / N layout, e.g. MMNR) in real time to match hardware reference wiring and observe AEC changes. Exported raw recordings and before/after comparisons in reports help troubleshoot echo residual and similar issues.
Covers 11 standardized audio tests across microphone, speaker, and AEC modules
Test project enables AEC inside ai_afe by default, consistent with the element configuration in this document
Web UI supports MIC gain, playback volume, and channel format adjustment for AEC comparison
Supports raw recording export and structured test reports
Companion test project: esp_audio_analyzer_app
SoC Compatibility
Different elements depend on different esp-sr models and hardware acceleration capabilities; the support matrix is as follows:
Element |
ESP32 |
ESP32-S3 |
ESP32-S31 |
ESP32-C3 |
ESP32-C5 |
ESP32-P4 |
|---|---|---|---|---|---|---|
ai_afe |
Supported |
Supported |
Supported |
Not supported |
Not supported |
Supported |
ai_aec |
Supported |
Supported |
Supported |
Not supported |
Supported |
Supported |
ai_wn |
Supported |
Supported |
Supported |
Supported |
Supported |
Supported |
ai_ns |
Supported |
Supported |
Supported |
Not supported |
Supported |
Supported |
ai_vad |
Supported |
Supported |
Supported |
Supported |
Supported |
Supported |
ai_doa |
Not supported |
Supported |
Supported |
Not supported |
Not supported |
Not supported |
Both ai_afe and ai_wn depend on the esp-sr model data partition; the application must reserve a model partition in the partition table and flash the corresponding model. For model preparation and flashing steps, refer to the esp-sr documentation and the model configuration instructions in elements/gmf_ai_audio/examples/wwe/README_CN.md.
FAQ
Q: Wake word detection sensitivity is insufficient or events are not reported. How to troubleshoot?
Check in order: whether afe_config_t.wakenet_init is true, whether the model partition is correctly flashed, whether the number of M channels in input_format matches the hardware microphone wiring, and whether the microphone sampling level is too low (use an oscilloscope or esp_gmf_afe_wakeup_info_t.data_volume to back-calculate). The wwe example’s README.md provides a complete hardware checklist.
Q: feed_task triggered a task watchdog timeout?
AFE inference has high CPU usage; feed_task and fetch_task should be assigned to different cores. On single-core ESP32 chips, feed_task easily times out when competing with other high-load application tasks; it is recommended to increase fetch_task_setting.prio or use a dedicated timer task to write input data to the AFE.
Q: ai_aec output has noticeable echo residue?
Confirm four things: whether the reference signal (R channel) is connected to the speaker output reference, whether the sample rate is 16 kHz, whether there is a timing offset between the microphone and reference, and whether filter_len is too small (recommended 4 for ESP32-S3 / P4). For specific debugging methods, see the header comments in esp_gmf_aec.c and the esp-sr AEC documentation.
Q: No event after command word detection begin?
Check whether vcmd_detect_en is set to true in esp_gmf_afe_cfg_t, whether mn_language matches the model language (cn / en), and whether a command word was input within vcmd_timeout. After timeout, ESP_GMF_AFE_EVT_VCMD_DECT_TIMEOUT is returned; begin must be called again.
Q: How to choose between ai_wn and ai_afe?
Use ai_wn for lightweight wake-word-only scenarios (Bluetooth speakers, sensor nodes); use ai_afe when full voice interaction is needed (wakeup + VAD + command word / AEC / NS). Both process raw multi-channel PCM and cannot be chained in the same pipeline.
Q: How to use esp_gmf_afe_manager standalone without connecting to a GMF pipeline?
esp_gmf_afe_manager_create() does not require the caller to be a GMF element; both read_cb and result_cb are ordinary callbacks. It can be used standalone in non-GMF scenarios with self-managed input/output loops; it no longer provides the acquire-release protocol and pipeline control capability.
API Reference
Header files for this component:
esp_gmf_afe_manager.h: AFE manager configuration, feature toggling, pause / resumeesp_gmf_afe.h: ai_afe element initialization, command word control, manual wakeup, event callbacksesp_gmf_aec.h: ai_aec element configurationesp_gmf_wn.h: ai_wn element configuration and detection callbacksesp_gmf_ns.h: ai_ns element configurationesp_gmf_vad.h: ai_vad element configuration and result callbacksesp_gmf_doa.h: ai_doa element configuration and direction estimation callbacksesp_gmf_ai_audio_methods.h: runtime method name macros
Header File
Functions
-
esp_gmf_err_t esp_gmf_afe_manager_create(esp_gmf_afe_manager_cfg_t *cfg, esp_gmf_afe_manager_handle_t *handle)
Create an AFE Manager instance.
- Parameters:
cfg – [in] Pointer to the AFE manager configuration structure
handle – [out] Pointer to the created AFE manager handle
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_FAIL Failed to create the AFE manager
ESP_GMF_ERR_MEMORY_LACK Insufficient memory allocation
-
esp_gmf_err_t esp_gmf_afe_manager_destroy(esp_gmf_afe_manager_handle_t handle)
Destroy an AFE Manager instance.
- Parameters:
handle – [in] AFE manager handle to be destroyed
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid handle
-
esp_gmf_err_t esp_gmf_afe_manager_set_read_cb(esp_gmf_afe_manager_handle_t handle, esp_gmf_afe_manager_read_cb_t read_cb, void *read_ctx)
Set the audio input read callback for the AFE Manager.
Note
If the read callback is set to
NULL, the AFE Manager will be suspended- Parameters:
handle – [in] AFE manager handle
read_cb – [in] Function pointer to the read callback
read_ctx – [in] User-defined context to be passed to the callback
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid arguments
-
esp_gmf_err_t esp_gmf_afe_manager_set_result_cb(esp_gmf_afe_manager_handle_t handle, esp_gmf_afe_manager_result_cb_t proc_cb, void *user_ctx)
Register a processing result callback for the AFE Manager.
- Parameters:
handle – [in] AFE manager handle
proc_cb – [in] Function pointer to the result callback
user_ctx – [in] User-defined context to be passed to the callback
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid arguments
-
esp_gmf_err_t esp_gmf_afe_manager_suspend(esp_gmf_afe_manager_handle_t handle, bool suspend)
Suspend or resume the AFE Manager.
- Parameters:
handle – [in] AFE manager handle
suspend – [in]
trueto suspend,falseto resume
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid arguments
-
esp_gmf_err_t esp_gmf_afe_manager_enable_features(esp_gmf_afe_manager_handle_t handle, esp_gmf_afe_feature_t feature, bool enable)
Enable or disable specific features in the AFE Manager.
- Parameters:
handle – [in] AFE manager handle
feature – [in] Feature to be configured (see
esp_gmf_afe_feature_t)enable – [in]
trueto enable,falseto disable
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid arguments
-
esp_gmf_err_t esp_gmf_afe_manager_get_features(esp_gmf_afe_manager_handle_t handle, esp_gmf_afe_manager_features_t *features)
Retrieve the current feature enable states of the AFE Manager.
- Parameters:
handle – [in] AFE manager handle
features – [out] Pointer to a structure to store the feature states
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid arguments
-
esp_gmf_err_t esp_gmf_afe_manager_get_chunk_size(esp_gmf_afe_manager_handle_t handle, size_t *size)
Get the processing chunk size for the AFE Manager.
Note
The chunk size represents the number of audio samples per channel. The AFE Manager processes data in fixed-size chunks.
- Parameters:
handle – [in] AFE manager handle
size – [out] Pointer to store the chunk size (unit: samples)
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid arguments
-
esp_gmf_err_t esp_gmf_afe_manager_get_input_ch_num(esp_gmf_afe_manager_handle_t handle, uint8_t *ch_num)
Retrieve the number of input channels for the AFE Manager.
- Parameters:
handle – [in] AFE manager handle
ch_num – [out] Pointer to store the number of channels
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid arguments
Structures
-
struct esp_gmf_afe_manager_task_setting_t
Configuration structure for the task setting.
-
struct esp_gmf_afe_manager_cfg_t
Configuration structure for the AFE manager.
Public Members
-
afe_config_t *afe_cfg
Configuration of ESP AFE
-
esp_gmf_afe_manager_task_setting_t feed_task_setting
Feed task setting
-
esp_gmf_afe_manager_task_setting_t fetch_task_setting
Fetch task setting
-
esp_gmf_afe_manager_read_cb_t read_cb
Callback function for reading audio data
-
void *read_ctx
Context for the read callback function
-
esp_gmf_afe_manager_result_cb_t result_cb
Callback function for processing AFE results
-
void *result_ctx
Context for the result callback function
-
afe_config_t *afe_cfg
-
struct esp_gmf_afe_manager_features_t
GMF AFE Manager Feature Configuration.
This structure defines the feature enable states for the AFE manager A value of `true` indicates that the feature is enabled, while `false` indicates it is disabled
Macros
-
ESP_AFE_MANAGER_FEED_TASK_CORE
The AFE Manager aims to provide users with a simple interface for managing AFE (Audio front end) functions, including WakeNet, VAD, AEC, SE, and more This component will automatically create feed and fetch tasks, users only need to provide data read callback functions and result processing callback functions Users can configure
AFEfunctions through theafe_config_tstructure The data fed intoAFEmust be in 16-bit PCM format with a sampling rate of 16kHz, the number of channels and channel arrangement are determined by the configuration in theafe_config_initfunction, for details, please refer to the description of theafe_config_initfunction which provide byesp-sr
-
ESP_AFE_MANAGER_FEED_TASK_PRIO
-
ESP_AFE_MANAGER_FEED_TASK_STACK
-
ESP_AFE_MANAGER_FETCH_TASK_CORE
-
ESP_AFE_MANAGER_FETCH_TASK_PRIO
-
ESP_AFE_MANAGER_FETCH_TASK_STACK
-
DEFAULT_GMF_AFE_MANAGER_CFG(_afe_cfg, _read_cb, _read_ctx, _result_cb, _result_ctx)
Type Definitions
-
typedef void *esp_gmf_afe_manager_handle_t
Handle for the AFE manager.
-
typedef void (*esp_gmf_afe_manager_result_cb_t)(afe_fetch_result_t *result, void *user_ctx)
Callback function type for processing AFE results.
- Param result:
[in] Pointer to the result structure
- Param user_ctx:
[in] User context to be passed to the callback function
-
typedef int32_t (*esp_gmf_afe_manager_read_cb_t)(void *buffer, int buf_sz, void *user_ctx, uint32_t ticks)
Callback type for reading data.
- Param buffer:
[in] Pointer to the buffer to read data into
- Param buf_sz:
[in] Size of the buffer
- Param user_ctx:
[in] User context to be passed to the callback function
- Param ticks:
[in] Number of ticks to wait for data
- Return:
Enumerations
Header File
Functions
-
esp_gmf_err_t esp_gmf_afe_init(void *config, esp_gmf_obj_handle_t *handle)
Initialize the GMF AFE.
- Parameters:
config – [in] Pointer to the configuration structure
handle – [out] Pointer to the handle to be created
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_MEMORY_LACK Memory allocation failed
ESP_GMF_ERR_INVALID_ARG Invalid argument
-
esp_gmf_err_t esp_gmf_afe_vcmd_detection_begin(esp_gmf_element_handle_t handle)
Begin voice command detection.
- Parameters:
handle – [in] Handle to the GMF object
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid argument
ESP_GMF_ERR_INVALID_STATE Voice command not enabled
-
esp_gmf_err_t esp_gmf_afe_vcmd_detection_cancel(esp_gmf_element_handle_t handle)
Cancel voice command detection.
Note
This function is used to clear the states of voice command detection process, the voice command detection will stay enabled, and the user can call
esp_gmf_afe_vcmd_detection_beginto start the detection again- Parameters:
handle – [in] Handle to the GMF object
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid argument
ESP_GMF_ERR_INVALID_STATE Voice command not enabled
-
esp_gmf_err_t esp_gmf_afe_set_event_cb(esp_gmf_element_handle_t handle, esp_gmf_afe_event_cb_t cb, void *ctx)
Set the event callback for the AFE (Audio Front-End) element.
This function registers a callback function to handle events generated by the AFE element. The callback will be invoked with the specified context whenever an event occurs
- Parameters:
handle – The handle to the AFE element
cb – The callback function to handle AFE events
ctx – User-defined context to be passed to the callback function
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid argument
ESP_GMF_ERR_INVALID_STATE Config not exist
-
esp_gmf_err_t esp_gmf_afe_keep_awake(esp_gmf_element_handle_t handle, bool enable)
Enable or disable keep-awake mode.
When keep-awake mode is enabled, the system will remain in the wake state and prevent wakeup_end events from being triggered automatically This is useful for scenarios where you want to keep the system active without automatic timeout
- Parameters:
handle – The handle to the AFE element
enable – True to enable keep-awake mode, false to disable
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid argument
ESP_GMF_ERR_INVALID_STATE Config not exist
ESP_GMF_ERR_TIMEOUT Command send timeout
-
esp_gmf_err_t esp_gmf_afe_trigger_wakeup(esp_gmf_element_handle_t handle)
Manually trigger wakeup state.
This function allows manual activation of the wakeup state without waiting for automatic wakeword detection. It is useful in the following scenarios: 1. **Button-triggered activation**: When users press a physical button to activate voice interaction, bypassing the need for wakewords 2. **External event-driven activation**: When the system needs to enter wakeup state based on external triggers (sensors, timers, network events) After calling this function, the AFE will enter wakeup state and begin listening for voice commands (if voice command detection is enabled). The system will generate ESP_GMF_AFE_EVT_WAKEUP_START event and remain active according to the configured wakeup_time duration.
- Parameters:
handle – [in] Handle to the GMF object
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid argument
ESP_GMF_ERR_INVALID_STATE Element not opened
ESP_GMF_ERR_TIMEOUT Command send timeout
-
esp_gmf_err_t esp_gmf_afe_trigger_sleep(esp_gmf_element_handle_t handle)
Manually trigger sleep of wakeup state.
- Parameters:
handle – [in] Handle to the GMF object
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid argument
ESP_GMF_ERR_INVALID_STATE Element not opened
ESP_GMF_ERR_TIMEOUT Command send timeout
Structures
-
struct esp_gmf_afe_wakeup_info_t
Information when wakeup state detected, event data for “ESP_GMF_AFE_EVT_WAKEUP_START”.
-
struct esp_gmf_afe_vcmd_info_t
Information when voice command detected, event data for
ESP_GMF_AFE_EVT_VCMD_DECTECTED
-
struct esp_gmf_afe_evt_t
Event structure for GMF AFE.
Public Members
-
esp_gmf_afe_event_t type
Event type
-
void *event_data
Event data
-
size_t data_len
Length of event data
-
esp_gmf_afe_event_t type
-
struct esp_gmf_afe_cfg_t
Configuration structure for GMF AFE wrapper.
Public Members
-
esp_gmf_afe_manager_handle_t afe_manager
AFE Manager handle
-
uint32_t delay_samples
Number of samples to delay Note: If the user wants to using the output of AFE only after detecting the VAD start event, the time corresponding to the value of this parameter should not be less than the
vad_min_speech_msinafe_config_tused when creating the afe_manager, otherwise, a small portion of the data at the beginning of the voice may be lost
-
void *models
List of models
-
uint32_t wakeup_time
Unit:ms. The duration that the wakeup state remains when VAD is not triggered
-
uint32_t wakeup_end
Unit:ms. When the silence time after AUDIO_REC_VAD_END state exceeds this value, it is determined as AUDIO_REC_WAKEUP_END
-
bool vcmd_detect_en
Enable voice command detection
-
uint32_t vcmd_timeout
Timeout for voice command detection, units: ms
-
const char *mn_language
Language for the multi-net model,
cnoren
-
esp_gmf_afe_event_cb_t event_cb
Callback function for AI audio events
-
void *event_ctx
User context to be passed to the callback function
-
esp_gmf_afe_manager_handle_t afe_manager
Macros
-
ESP_GMF_AFE_VCMD_MAX_LEN
-
ESP_GMF_AFE_DEFAULT_DELAY_SAMPLES
-
ESP_GMF_AFE_DEFAULT_WAKEUP_TIME_MS
-
ESP_GMF_AFE_DEFAULT_WAKEUP_END_MS
-
ESP_GMF_AFE_DEFAULT_VCMD_TIMEOUT_MS
-
DEFAULT_GMF_AFE_CFG(__afe_manager, __event_cb, __event_ctx, __models)
Type Definitions
-
typedef void (*esp_gmf_afe_event_cb_t)(esp_gmf_element_handle_t el, esp_gmf_afe_evt_t *event, void *user_data)
Callback type for GMF AFE events.
Enumerations
-
enum esp_gmf_afe_event_t
AFE manager event type.
Values:
-
enumerator ESP_GMF_AFE_EVT_WAKEUP_START
Wakeup start
-
enumerator ESP_GMF_AFE_EVT_WAKEUP_END
Wakeup stop
-
enumerator ESP_GMF_AFE_EVT_VAD_START
Vad start
-
enumerator ESP_GMF_AFE_EVT_VAD_END
Vad stop
-
enumerator ESP_GMF_AFE_EVT_VCMD_DECT_TIMEOUT
Voice command detect timeout
-
enumerator ESP_GMF_AFE_EVT_VCMD_DECTECTED
Form 0 is the id of the voice commands detected by Multinet
-
enumerator ESP_GMF_AFE_EVT_WAKEUP_START
Header File
Functions
-
esp_gmf_err_t esp_gmf_aec_init(esp_gmf_aec_cfg_t *cfg, esp_gmf_obj_handle_t *out_handle)
Initialize the Espressif AEC element.
- Parameters:
cfg – [in] Pointer to the configuration structure
out_handle – [out] Pointer to the handle to be created
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_MEMORY_LACK Memory allocation failed
ESP_GMF_ERR_INVALID_ARG Invalid argument
Structures
-
struct esp_gmf_aec_cfg_t
Configuration structure for AEC.
Note
The input format, same as afe config:
Mto represent the microphone channel,Rto represent the playback reference channel,Nto represent an unknown or unused channel For example, input_format=”MMNR” indicates that the input data consists of four channels, which are the microphone channel, the microphone channel, an unused channel, and the playback channel
Header File
Functions
-
esp_gmf_err_t esp_gmf_wn_init(esp_gmf_wn_cfg_t *config, esp_gmf_element_handle_t *handle)
Initialize the WakeNet element.
- Parameters:
config – [in] Pointer to the configuration structure
handle – [out] Pointer to the handle to be initialized
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid argument
ESP_GMF_ERR_MEMORY_LACK Memory allocation failed
ESP_GMF_ERR_FAIL Other failures
-
esp_gmf_err_t esp_gmf_wn_set_detect_cb(esp_gmf_element_handle_t handle, esp_wn_detect_cb_t detect_cb, void *ctx)
Set the voice trigger detection callback for WakeNet This function registers a user-defined callback that will be invoked when WakeNet detects a wake word.
- Parameters:
handle – [in] Handle to the WakeNet element
detect_cb – [in] Callback function to be called on wake word detection
ctx – [in] User-defined context to be passed to the callback
- Returns:
ESP_GMF_ERR_OK Success
ESP_GMF_ERR_INVALID_ARG Invalid argument
Structures
-
struct esp_gmf_wn_cfg_t
Configuration structure for WakeNet.
Note
The input format, same as afe config:
Mto represent the microphone channel,Rto represent the playback reference channel,Nto represent an unknown or unused channel For example, input_format=”MMNR” indicates that the input data consists of four channels, which are the microphone channel, the microphone channel, an unused channel, and the playback channel
Type Definitions
-
typedef void (*esp_wn_detect_cb_t)(esp_gmf_element_handle_t handle, int32_t trigger_ch, void *user_ctx)
Callback type for WakeNet detection.
- Param handle:
[in] Handle to the WakeNet object
- Param trigger_ch:
[in] The microphone channel that triggered the detection
- Param user_ctx:
[in] User context passed during initialization
Header File
Macros
-
ESP_GMF_METHOD_AFE_START_VCMD_DET
-
ESP_GMF_METHOD_AFE_START_VCMD_DET_ARG_EN