Espressif DSP Library Benchmarks

The table bellow contains benchmarks of functions provided by ESP-DSP library. The values are CPU cycle counts taken to execute each of the functions. The Values in the column “O2” are made with compiler optimization for speed, and in the column “Os” column are made with compiler optimization for size. The values in “ESP32”, “ESP32S3” and “ESP32P4” columns are for the optimized (assembly) implementation, values in “ANSI” column are for the non-optimized implementation.

Function

Optimization

ESP32

O2

ESP32S3

O2

ESP32P4

O2

ANSI Xtensa

O2

ANSI Risc-V

O2

ESP32

Os

ESP32S3

Os

ESP32P4

Os

ANSI Xtensa

Os

ANSI Risc-V

Os

Dot Product

dsps_dotprod_f32 for N=256 points

1047

432

1319

1311

1563

1047

432

1320

4117

2336

dsps_dotprode_f32 for N=256 points with step 1

1307

1307

1314

2581

1819

1308

1309

1317

3609

2087

dsps_dotprod_s16 for N=256 points

437

307

208

3635

3385

437

307

202

6708

4170

FIR Filters

dsps_fir_f32 1024 input samples and 256 coefficients

1078691

443671

1606563

1353122

1606552

1078973

443680

3050985

4497990

3050972

dsps_fird_f32 1024 samples 256 coeffs and decimation 4

276504

115499

339796

350726

415332

277204

115477

339780

1206798

814168

FFTs Radix-2 32 bit Floating Point

dsps_fft2r_fc32 for 64 complex points

4544

3970

4989

7093

6062

4545

3971

4988

8032

8144

dsps_fft2r_fc32 for 128 complex points

10342

8999

11404

16091

13725

10342

8998

11391

18192

18622

dsps_fft2r_fc32 for 256 complex points

23210

20139

25684

36066

30779

23210

20250

25671

40737

41955

dsps_fft2r_fc32 for 512 complex points

51503

44594

57181

79978

68325

51503

44594

57167

90424

93483

dsps_fft2r_fc32 for 1024 complex points

113205

97847

126053

175729

140083

113205

97847

126039

198336

186812

FFTs Radix-4 32 bit Floating Point

dsps_fft4r_fc32 for 64 complex points

3004

2597

3483

5035

3761

3004

2597

3480

5681

3876

dsps_fft4r_fc32 for 256 complex points

15348

13213

18052

25198

19266

15347

13212

18036

28714

19788

dsps_fft4r_fc32 for 1024 complex points

75068

64482

89820

122592

95550

75068

64482

89766

140116

98026

FFTs 16 bit Fixed Point

dsps_fft2r_sc16 for 64 complex points

8775

774

897

9287

9055

8775

775

894

10603

9472

dsps_fft2r_sc16 for 128 complex points

20205

1608

1832

20798

20675

20204

1610

1835

24067

21633

dsps_fft2r_sc16 for 256 complex points

45746

3412

3873

46164

46576

45745

3410

3876

54611

48735

dsps_fft2r_sc16 for 512 complex points

102198

7294

8234

101611

103739

102199

7293

8237

119792

108555

dsps_fft2r_sc16 for 1024 complex points

226154

15623

17523

221955

228808

225852

15624

17526

263431

239447

IIR Filters

dsps_biquad_f32 - biquad filter for 1024 input samples

17442

17552

15391

26651

21544

17441

17441

15403

36883

32789

Matrix Multiplication

dspm_mult_f32 - C[16;16] = A[16;16]*B[16;16]

24659

6280

28276

56915

31481

24660

6282

28239

66482

38913

dspm_mult_s16 - C[16;16] = A[16;16]*B[16;16]

24697

2004

2138

71112

60715

24696

1831

2141

126689

63865

dspm_mult_3x3x1_f32 - C[3;1] = A[3;3]*B[3;1]

70

71

144

247

161

69

131

159

258

164

dspm_mult_3x3x3_f32 - C[3;3] = A[3;3]*B[3;3]

200

200

338

560

309

201

199

332

559

351

dspm_mult_4x4x1_f32 - C[4;1] = A[4;4]*B[4;1]

102

103

206

371

227

103

104

202

398

258

dspm_mult_4x4x4_f32 - C[4;4] = A[4;4]*B[4;4]

395

175

658

1150

608

394

174

658

1189

749

Image processing prototypes

dspi_dotprod_s8/u8 - dotproduct of two images 16x16

2635

161

225

2633

2436

4021

304

228

4019

2373

dspi_dotprod_off_s8/u8 - dotproduct of two images 16x16

2891

228

240

2891

2736

4802

227

243

4801

2584

dspi_dotprod_s8/u8- dotproduct of two images 64x64

37962

689

1662

37962

33723

58885

687

1664

58883

33800

dspi_dotprod_off_s8/u8 - dotproduct of two images 64x64

42059

995

1792

42058

37884

71187

995

1795

71186

37768

dspi_dotprod_s16/u16 - dotproduct of two images 8x8

1189

163

153

1190

994

1749

146

152

1750

1018

dspi_dotprod_off_s16/u16 - dotproduct of two images 8x8

1258

362

145

1258

1056

2647

179

143

2646

1088

dspi_dotprod_s16 - dotproduct of two images 32x32

15181

409

853

15181

13888

24334

407

859

24964

13831

dspi_dotprod_off_s16/u16 - dotproduct of two images 32x32

16210

561

921

16209

14874

28462

558

921

28462

14850

The benchmark test could be reproduced by executing test cases found in /test/test_dsp.c.