AI Agent Solution

[中文]

Note

This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.

Overview

AI Agents have implemented audio and video interaction application code based on the ESP32 platform. This application is based on the ESP-GMF architecture and integrates AI Agent device-side development, providing developers with a complete audio and video interaction solution.

Application Architecture

The AI Agents application is based on the ESP-GMF architecture and mainly includes the following two core modules:

Audio-processor module

Mainly responsible for audio data processing, including:

  • Playback

    • Supports local audio file playback

    • Supports network audio playback

    • Supports decoding of various audio formats

    • Can be used as a source of background music or prompt sounds

  • Feeder (streaming playback)

    • Plays real-time streaming audio data (such as WebSocket, HTTP stream, memory buffer)

    • Commonly used in TTS, real-time voice delivery, online audio playback, etc.

    • Can be combined with Mixer for mixed audio output

  • Recorder

    • Audio collection function

    • Supports 3A algorithm processing (AEC, ANS, AGC)

    • Supports encoded output (PCM, AMR, OPUS, WAV, etc.)

    • Can be used for intelligent voice interaction, voice upload, etc.

  • Mixer (mixing)

    • Mixes Playback and Feeder for mixed audio output

    • Can expand multiple input channels

    • Suitable for background music + real-time voice, prompt sound overlay, etc.

Video-processor module

Mainly responsible for video data processing, including:

  • Video capture

  • Video encoding and decoding

  • Video rendering

AI Agent Architecture Diagram

Feature Characteristics

The table below lists the mainstream AI platforms supported by the AI Agents application and the feature support in each AI platform:

Platform Feature Comparison

Platform

Voice Call

Voice Interaction

Visual Processing

Audio and Video Dialogue

Example Link

Volcano RTC

Volcano RTC Example

COZE

COZE Example

BRTC

BRTC Example

Tencent Cloud RTC

Tencent Cloud RTC Example

Tongyi

To be released