GMF-AI-Audio Component

[中文]

Note

This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.

Overview

GMF-AI-Audio is a voice interaction component developed based on the GMF framework. By encapsulating ESP-SR, it provides a complete interaction logic from voice wake-up to command recognition. The component integrates functions such as Wake Word detection, Voice Activity Detection (VAD), voice command recognition, and Acoustic Echo Cancellation (AEC), enabling efficient and natural voice interaction experiences in smart speakers, smart home devices, and more.

Supported Scenarios

Method

Corresponding Scenario

Immediately upload voice data after wake-up, stop uploading at the Wakeup End stage

Implement VAD function in the cloud, RTC scenarios

Wait for VAD to trigger after wake-up before starting to upload, stop uploading after VAD ends

Traditional interaction method of smart hardware

No wake-up, wait for VAD to trigger before starting to upload, stop uploading after VAD ends

New cloud processing logic

Immediately upload voice data after pressing the button, stop after releasing

Devices with limited computing power implement voice functions through interaction with the cloud

Wait for VAD to trigger after pressing the button before starting to upload, stop uploading after VAD ends

Solve the problem of excessive data volume caused by relying solely on VAD

Detect command words after wake-up

Default usage logic

No wake-up, wait for VAD to trigger before detecting command words

Can be applied to some vehicle systems

Detect command words after pressing the button

Toys

Continuous command word recognition

Home control