AI Agent Solution

Note

This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.

Overview

AI Agents have implemented audio and video interaction application code based on the ESP32 platform. This application is based on the ESP-GMF architecture and integrates AI Agent device-side development, providing developers with a complete audio and video interaction solution.

Application Architecture

The AI Agents application is based on the ESP-GMF architecture and mainly includes the following two core modules:

Audio-processor module

Mainly responsible for audio data processing, including:

Playback
- Supports local audio file playback
- Supports network audio playback
- Supports decoding of various audio formats
- Can be used as a source of background music or prompt sounds
Feeder (streaming playback)
- Plays real-time streaming audio data (such as WebSocket, HTTP stream, memory buffer)
- Commonly used in TTS, real-time voice delivery, online audio playback, etc.
- Can be combined with Mixer for mixed audio output
Recorder
- Audio collection function
- Supports 3A algorithm processing (AEC, ANS, AGC)
- Supports encoded output (PCM, AMR, OPUS, WAV, etc.)
- Can be used for intelligent voice interaction, voice upload, etc.
Mixer (mixing)
- Mixes Playback and Feeder for mixed audio output
- Can expand multiple input channels
- Suitable for background music + real-time voice, prompt sound overlay, etc.

Video-processor module

Mainly responsible for video data processing, including:

Video capture
Video encoding and decoding
Video rendering

Feature Characteristics

The table below lists the mainstream AI platforms supported by the AI Agents application and the feature support in each AI platform:

Platform Feature Comparison
Platform	Voice Call	Voice Interaction	Visual Processing	Audio and Video Dialogue	Example Link
Volcano RTC	✓	✓	✓	✓	Volcano RTC Example
COZE		✓			COZE Example
BRTC	✓	✓	✓	✓	BRTC Example
Tencent Cloud RTC		✓			Tencent Cloud RTC Example
Tongyi	✓	✓	✓	✓	To be released