性能测试结果

[English]

AFE

资源消耗

AFE 配置和算法流程

Config

Pipeline

MR, SR, LOW_COST

|AEC(SR_LOW_COST)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MR, SR, HIGH_PERF

|AEC(SR_HIGH_PERF)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MR, VC, LOW_COST

|AEC(VOIP_LOW_COST)| -> |NS(nsnet2)| -> |VAD(vadnet1_medium)|

MR, VC, HIGH_PERF

|AEC(VOIP_HIGH_PERF)| -> |NS(nsnet2)| -> |VAD(vadnet1_medium)|

MMNR, SR, LOW_COST

|AEC(SR_LOW_COST)| -> |SE(BSS)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MMNR, SR, HIGH_PERF

|AEC(SR_HIGH_PERF)| -> |SE(BSS)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MR, FD, LOW_COST

|AEC(FD_LOW_COST)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MR, FD, HIGH_PERF

|AEC(FD_HIGH_PERF)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MMNR, FD, LOW_COST

|AEC(FD_LOW_COST)| -> |SE(BSS)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

MMNR, FD, HIGH_PERF

|AEC(FD_HIGH_PERF)| -> |SE(BSS)| -> |VAD(vadnet1_medium)| -> |WakeNet(wn9_hilexin,)|

备注

  • MR: 一个麦克风通道和一个播放通道

  • MMNR: 两个麦克风通道和一个播放通道

  • Models: nsnet2, vadnet1_medium, wn9_hilexin

  • Setting: ESP32-P4 @ 400 MHz, CONFIG_CACHE_L2_CACHE_256KB=y, CONFIG_CACHE_L2_CACHE_LINE_128B=y.

AFE 配置和性能

Config

Internal RAM (KB)

PSRAM (KB)

Feed CPU usage (1 core,%)

Fetch CPU usage (1 core,%)

MR, SR, LOW_COST

60.1

739.7

8.8

9.8

MR, SR, HIGH_PERF

49.1

775.8

9.3

9.8

MR, FD, LOW_COST

60.2

777.7

12.1

9.8

MR, FD, HIGH_PERF

49.2

813.8

12.5

9.8

MR, VC, LOW_COST

48.7

819.7

30.6

4.7

MR, VC, HIGH_PERF

91.1

822.2

32.2

4.7

MMNR, SR, LOW_COST

79.1

1153.7

23.7

22.9

MMNR, SR, HIGH_PERF

68.1

1200.4

24.9

22.9

MMNR, FD, LOW_COST

79.2

1191.7

29.4

22.9

MMNR, FD, HIGH_PERF

68.1

1238.5

30.4

22.9

WakeNet

资源消耗

Model Type

RAM

PSRAM

Average Running Time per Frame

Frame Length

Quantised WakeNet9 @ 2 channel

16 KB

324 KB

2.6 ms

32 ms

Quantised WakeNet9 @ 3 channel

20 KB

347 KB

3.1 ms

32 ms

性能测试

Distance

Quiet

Stationary Noise (SNR = 4 dB)

Speech Noise (SNR = 4 dB)

AEC I nterruption (-10 dB)

1 m

98%

96%

94%

96%

3 m

98%

96%

94%

94%

误触发率:12 小时 1 次

备注

以上测试结果基于 ESP32-S3-Korvo V4.0 开发板和 WakeNet9(Alexa) 模型。

MultiNet

资源消耗

Model Type

Internal RAM

PSRAM

Average Running Time per Frame

Frame Length

MultiNet 7

18 KB

2920 KB

8 ms

32 ms

Word Error Rate 性能测试

Model Type

aishell test

MultiNet 5_cn

9.5%

MultiNet 6_cn

5.2%

备注

中文使用没有声调的拼音单元去计算 WER。

Speech Commands 性能测试(空调控制场景)

Model Type

Distance

Quiet

Stationary Noise (SNR=5~10dB dB)

Speech Noise (SNR=5~10dB dB)

MultiNet 5_cn

3 m

88.9%

66.1%

67.5%

MultiNet 6_cn

3 m

98.8%

88.3%

88.0%

MultiNet 6_cn_ac

3 m

97.1%

95.1%

96.8%

备注

MultiNet6_cn_ac在空调场景数据集上进行了进一步的微调,所以在空调控制场景具有更好的性能。

TTS

资源消耗

Flash image size: 2.2 MB

RAM runtime: 20 KB

性能测试

CPU 负载测试(ESP32 @240 MHz):

Speech Rate

0

1

2

3

4

5

Times faster than real time

4.5

3.2

2.9

2.5

2.2

1.8

NSNET

性能测试

数据集:array_onemic_nnoise_20230608(按照亚马逊声学认证标准录制测试集)

dnsmos

nsnet1

2.4

nsnet2

2.71