Benchmark

[中文]

AFE

Resource Consumption

AFE configuration and pipeline

Config

Pipeline

MR, SR, LOW_COST

|AEC(SR_LOW_COST)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MR, SR, HIGH_PERF

|AEC(SR_HIGH_PERF)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MR, VC, LOW_COST

|AEC(VOIP_LOW_COST)| -> |NS(nsnet2)| -> |VAD(vadnet1_medium)|

MR, VC, HIGH_PERF

|AEC(VOIP_HIGH_PERF)| -> |NS(nsnet2)| -> |VAD(vadnet1_medium)|

MMNR, SR, LOW_COST

|AEC(SR_LOW_COST)| -> |SE(BSS)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MMNR, SR, HIGH_PERF

|AEC(SR_HIGH_PERF)| -> |SE(BSS)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

Note

  • MR: one microphone channel and one playback channel

  • MMNR: two microphone channels and one playback channels

  • Models: nsnet2, vadnet1_medium, wn9_hilexin

AFE configuration and Performance

Config

Internal RAM (KB)

PSRAM (KB)

Feed CPU usage (1 core,%)

Fetch CPU usage (1 core,%)

MR, SR, LOW_COST

72.3

732.7

8.4

15.0

MR, SR, HIGH_PERF

78.0

734.7

9.4

14.9

MR, VC, LOW_COST

50.3

821.4

60.0

8.2

MR, VC, HIGH_PERF

93.7

824.0

64.0

8.2

MMNR, SR, LOW_COST

76.6

1173.9

36.6

30.0

MMNR, SR, HIGH_PERF

99.0

1173.7

38.8

30.0

WakeNet

Resource Consumption

Model Type

RAM

PSRAM

Average Running Time per Frame

Frame Length

Quantised WakeNet8 @ 2 channel

50 KB

1640 KB

10.0 ms

32 ms

Quantised WakeNet9 @ 2 channel

16 KB

324 KB

3.0 ms

32 ms

Quantised WakeNet9 @ 3 channel

20 KB

347 KB

4.3 ms

32 ms

Performance Test

Distance

Quiet

Stationary Noise (SNR = 4 dB)

Speech Noise (SNR = 4 dB)

AEC I nterruption (-10 dB)

1 m

98%

96%

94%

96%

3 m

98%

96%

94%

94%

False triggering rate: once in 12 hours

Note

In this test, we used ESP32-S3-Korvo V4.0 development board and WakeNet9(Alexa) model.

MultiNet

Resource Consumption

Model Type

Internal RAM

PSRAM

Average Running Time per Frame

Frame Length

MultiNet 4

16.8KB

1866 KB

18 ms

32 ms

MultiNet 4 Q8

10.5 KB

1009 KB

11 ms

32 ms

MultiNet 5 Q8

16 KB

2310 KB

12 ms

32 ms

MultiNet 6

32 KB

4100 KB

12 ms

32 ms

MultiNet 7

18 KB

2920 KB

11 ms

32 ms

Word Error Rate Performance Test

Model Type

librispeech test-clean

librispeech test-other

MultiNet5-en

16.5%

41.4%

MultiNet6-en

9.0%

21.3%

MultiNet7-en

8.5%

21.3%

Speech Commands Performance Test

Model Type

Distance

Quiet

Stationary Noise (SNR=5~10dB dB)

Speech Noise (SNR=5~10dB dB)

MultiNet 5_en

3 m

95.4%

85.9%

82.7%

MultiNet 6_en

3 m

96.8%

87.9%

85.5%

MultiNet 7_en

3 m

97.2%

92.3%

90.6%

TTS

Resource Consumption

Flash image size: 2.2 MB

RAM runtime: 20 KB

Performance Test

CPU loading test (ESP32 @240 MHz):

Speech Rate

0

1

2

3

4

5

Times faster than real time

4.5

3.2

2.9

2.5

2.2

1.8