Benchmark

[中文]

AFE

Resource Consumption

Algorithm Type

RAM

Average cpu loading(compute with 2 cores)

Frame Length

AEC(LOW_COST)

152.3 KB

8%

32 ms

AEC(HIGH_PERF)

166 KB

11%

32 ms

BSS(LOW_COST)

198.7 KB

6%

64 ms

BSS(HIGH_PERF)

215.5 KB

7%

64 ms

NS

27 KB

5%

10 ms

MISO

56 KB

8%

16 ms

AFE Layer

227 KB

WakeNet

Resource Consumption

Model Type

RAM

PSRAM

Average Running Time per Frame

Frame Length

Quantised WakeNet8 @ 2 channel

50 KB

1640 KB

10.0 ms

32 ms

Quantised WakeNet9 @ 2 channel

16 KB

324 KB

3.0 ms

32 ms

Quantised WakeNet9 @ 3 channel

20 KB

347 KB

4.3 ms

32 ms

Performance Test

Distance

Quiet

Stationary Noise (SNR = 4 dB)

Speech Noise (SNR = 4 dB)

AEC I nterruption (-10 dB)

1 m

98%

96%

94%

96%

3 m

98%

96%

94%

94%

False triggering rate: once in 12 hours

Note

In this test, we used ESP32-S3-Korvo V4.0 development board and WakeNet9(Alexa) model.

MultiNet

Resource Consumption

Model Type

Internal RAM

PSRAM

Average Running Time per Frame

Frame Length

MultiNet 4

16.8KB

1866 KB

18 ms

32 ms

MultiNet 4 Q8

10.5 KB

1009 KB

11 ms

32 ms

MultiNet 5 Q8

16 KB

2310 KB

12 ms

32 ms

MultiNet 6

32 KB

4100 KB

12 ms

32 ms

MultiNet 7

18 KB

2920 KB

11 ms

32 ms

Word Error Rate Performance Test

Model Type

librispeech test-clean

librispeech test-other

MultiNet5-en

16.5%

41.4%

MultiNet6-en

9.0%

21.3%

MultiNet7-en

8.5%

21.3%

Speech Commands Performance Test

Model Type

Distance

Quiet

Stationary Noise (SNR=5~10dB dB)

Speech Noise (SNR=5~10dB dB)

MultiNet 5_en

3 m

95.4%

85.9%

82.7%

MultiNet 6_en

3 m

96.8%

87.9%

85.5%

MultiNet 7_en

3 m

97.2%

92.3%

90.6%

TTS

Resource Consumption

Flash image size: 2.2 MB

RAM runtime: 20 KB

Performance Test

CPU loading test (ESP32 @240 MHz):

Speech Rate

0

1

2

3

4

5

Times faster than real time

4.5

3.2

2.9

2.5

2.2

1.8