Benchmark
AFE
Resource Consumption
Config |
Pipeline |
---|---|
MR, SR, LOW_COST |
|
MR, SR, HIGH_PERF |
|
MR, VC, LOW_COST |
|
MR, VC, HIGH_PERF |
|
MMNR, SR, LOW_COST |
|
MMNR, SR, HIGH_PERF |
|
Note
MR: one microphone channel and one playback channel
MMNR: two microphone channels and one playback channels
Models: nsnet2, vadnet1_medium, wn9_hilexin
Config |
Internal RAM (KB) |
PSRAM (KB) |
Feed CPU usage (1 core,%) |
Fetch CPU usage (1 core,%) |
---|---|---|---|---|
MR, SR, LOW_COST |
73.6 |
733.2 |
10.6 |
11.2 |
MR, SR, HIGH_PERF |
73.3 |
733.2 |
10.6 |
11.2 |
MR, VC, LOW_COST |
74.4 |
821.3 |
40.2 |
5.7 |
MR, VC, HIGH_PERF |
116.7 |
823.9 |
42.4 |
5.7 |
MMNR, SR, LOW_COST |
78.0 |
1173.0 |
28.2 |
24.8 |
MMNR, SR, HIGH_PERF |
78.0 |
1173.0 |
28.2 |
24.8 |
WakeNet
Resource Consumption
Model Type |
RAM |
PSRAM |
Average Running Time per Frame |
Frame Length |
---|---|---|---|---|
Quantised WakeNet9 @ 2 channel |
16 KB |
324 KB |
2.6 ms |
32 ms |
Quantised WakeNet9 @ 3 channel |
20 KB |
347 KB |
3.1 ms |
32 ms |
Performance Test
Distance |
Quiet |
Stationary Noise (SNR = 4 dB) |
Speech Noise (SNR = 4 dB) |
AEC I nterruption (-10 dB) |
---|---|---|---|---|
1 m |
98% |
96% |
94% |
96% |
3 m |
98% |
96% |
94% |
94% |
False triggering rate: once in 12 hours
Note
The above test results are based on the ESP32-S3-Korvo V4.0 development board and the WakeNet9 (Alexa) model.
MultiNet
Resource Consumption
Model Type |
Internal RAM |
PSRAM |
Average Running Time per Frame |
Frame Length |
---|---|---|---|---|
MultiNet 7 |
18 KB |
2920 KB |
8 ms |
32 ms |
Word Error Rate Performance Test
Model Type |
librispeech test-clean |
librispeech test-other |
---|---|---|
MultiNet5-en |
16.5% |
41.4% |
MultiNet6-en |
9.0% |
21.3% |
MultiNet7-en |
8.5% |
21.3% |
Speech Commands Performance Test
Model Type |
Distance |
Quiet |
Stationary Noise (SNR=5~10dB dB) |
Speech Noise (SNR=5~10dB dB) |
---|---|---|---|---|
MultiNet 5_en |
3 m |
95.4% |
85.9% |
82.7% |
MultiNet 6_en |
3 m |
96.8% |
87.9% |
85.5% |
MultiNet 7_en |
3 m |
97.2% |
92.3% |
90.6% |
TTS
Resource Consumption
Flash image size: 2.2 MB
RAM runtime: 20 KB
Performance Test
CPU loading test (ESP32 @240 MHz):
Speech Rate |
0 |
1 |
2 |
3 |
4 |
5 |
---|---|---|---|---|---|---|
Times faster than real time |
4.5 |
3.2 |
2.9 |
2.5 |
2.2 |
1.8 |