Benchmark

[中文]

AFE

Resource Consumption

AFE configuration and pipeline

Config

Pipeline

MR, SR, LOW_COST

|AEC(SR_LOW_COST)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MR, SR, HIGH_PERF

|AEC(SR_HIGH_PERF)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MR, VC, LOW_COST

|AEC(VOIP_LOW_COST)| -> |NS(nsnet2)| -> |VAD(vadnet1_medium)|

MR, VC, HIGH_PERF

|AEC(VOIP_HIGH_PERF)| -> |NS(nsnet2)| -> |VAD(vadnet1_medium)|

MMNR, SR, LOW_COST

|AEC(SR_LOW_COST)| -> |SE(BSS)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MMNR, SR, HIGH_PERF

|AEC(SR_HIGH_PERF)| -> |SE(BSS)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

Note

  • MR: one microphone channel and one playback channel

  • MMNR: two microphone channels and one playback channels

  • Models: nsnet2, vadnet1_medium, wn9_hilexin

AFE configuration and Performance

Config

Internal RAM (KB)

PSRAM (KB)

Feed CPU usage (1 core,%)

Fetch CPU usage (1 core,%)

MR, SR, LOW_COST

73.6

733.2

10.6

11.2

MR, SR, HIGH_PERF

73.3

733.2

10.6

11.2

MR, VC, LOW_COST

74.4

821.3

40.2

5.7

MR, VC, HIGH_PERF

116.7

823.9

42.4

5.7

MMNR, SR, LOW_COST

78.0

1173.0

28.2

24.8

MMNR, SR, HIGH_PERF

78.0

1173.0

28.2

24.8

WakeNet

Resource Consumption

Model Type

RAM

PSRAM

Average Running Time per Frame

Frame Length

Quantised WakeNet9 @ 2 channel

16 KB

324 KB

2.6 ms

32 ms

Quantised WakeNet9 @ 3 channel

20 KB

347 KB

3.1 ms

32 ms

Performance Test

Distance

Quiet

Stationary Noise (SNR = 4 dB)

Speech Noise (SNR = 4 dB)

AEC I nterruption (-10 dB)

1 m

98%

96%

94%

96%

3 m

98%

96%

94%

94%

False triggering rate: once in 12 hours

Note

The above test results are based on the ESP32-S3-Korvo V4.0 development board and the WakeNet9 (Alexa) model.

MultiNet

Resource Consumption

Model Type

Internal RAM

PSRAM

Average Running Time per Frame

Frame Length

MultiNet 7

18 KB

2920 KB

8 ms

32 ms

Word Error Rate Performance Test

Model Type

librispeech test-clean

librispeech test-other

MultiNet5-en

16.5%

41.4%

MultiNet6-en

9.0%

21.3%

MultiNet7-en

8.5%

21.3%

Speech Commands Performance Test

Model Type

Distance

Quiet

Stationary Noise (SNR=5~10dB dB)

Speech Noise (SNR=5~10dB dB)

MultiNet 5_en

3 m

95.4%

85.9%

82.7%

MultiNet 6_en

3 m

96.8%

87.9%

85.5%

MultiNet 7_en

3 m

97.2%

92.3%

90.6%

TTS

Resource Consumption

Flash image size: 2.2 MB

RAM runtime: 20 KB

Performance Test

CPU loading test (ESP32 @240 MHz):

Speech Rate

0

1

2

3

4

5

Times faster than real time

4.5

3.2

2.9

2.5

2.2

1.8