Espressif DSP Library Benchmarks

The table bellow contains benchmarks of functions provided by ESP-DSP library. The values are CPU cycle counts taken to execute each of the functions. The Values in the column “O2” are made with compiler optimization for speed, and in the column “Os” column are made with compiler optimization for size. The values in “ESP32” and “ESP32S3” column are for the optimized (assembly) implementation, values in “ANSI” column are for the non-optimized implementation.

Function

Optimization

ESP32

O2

ESP32S3

O2

ANSI

O2

ESP32

Os

ESP32S3

Os

ANSI

Os

Dot Product

dsps_dotprod_f32 for N=256 points

1058

449

1325

1058

448

4129

dsps_dotprode_f32 for N=256 points with step 1

1317

1326

2853

1317

1326

3621

dsps_dotprod_s16 for N=256 points

447

323

3647

447

322

6466

FIR Filters

dsps_fir_f32 1024 input samples and 256 coefficients

1079312

443690

2150685

1079599

443688

5147785

dsps_fird_f32 1024 samples 256 coeffs and decimation 4

350915

115494

614234

350520

115495

1317367

FFTs Radix-2 32 bit Floating Point

dsps_fft2r_fc32 for 64 complex points

6079

5142

7037

5452

5140

8333

dsps_fft2r_fc32 for 128 complex points

13031

11706

15907

12399

11705

19035

dsps_fft2r_fc32 for 256 complex points

27828

26303

35562

27828

26303

42922

dsps_fft2r_fc32 for 512 complex points

61753

58435

78705

61753

58437

95673

dsps_fft2r_fc32 for 1024 complex points

135742

128697

172664

135742

128585

211252

FFTs Radix-4 32 bit Floating Point

dsps_fft4r_fc32 for 64 complex points

3125

3345

5185

3247

3176

5631

dsps_fft4r_fc32 for 256 complex points

15551

15522

26115

16056

15792

28397

dsps_fft4r_fc32 for 1024 complex points

75547

75273

127669

77587

76554

138522

FFTs 16 bit Fixed Point

dsps_fft2r_sc16 for 64 complex points

8786

793

14575

8786

794

15861

dsps_fft2r_sc16 for 128 complex points

20214

1828

33121

20214

1627

36238

dsps_fft2r_sc16 for 256 complex points

45755

3429

74290

45755

3428

81638

dsps_fft2r_sc16 for 512 complex points

102208

7512

164803

102416

7311

181758

dsps_fft2r_sc16 for 1024 complex points

225862

15641

362193

225861

15642

400853

IIR Filters

dsps_biquad_f32 - biquad filter for 1024 input samples

17450

17459

24613

17451

17630

36895

Matrix Multiplication

dspm_mult_f32 - C[16;16] = A[16;16]*B[16;16]

24669

6297

51502

24670

6498

78197

dspm_mult_s16 - C[16;16] = A[16;16]*B[16;16]

24707

1848

83699

24707

1848

99353

dspm_mult_3x3x1_f32 - C[3;1] = A[3;3]*B[3;1]

79

275

226

80

86

271

dspm_mult_3x3x3_f32 - C[3;3] = A[3;3]*B[3;3]

211

323

492

210

217

611

dspm_mult_4x4x1_f32 - C[4;1] = A[4;4]*B[4;1]

112

119

334

113

121

425

dspm_mult_4x4x4_f32 - C[4;4] = A[4;4]*B[4;4]

405

192

1008

404

194

1335

Image processing prototypes

dspi_dotprod_s8/u8 - dotproduct of two images 16x16

3827

381

3828

4010

179

4011

dspi_dotprod_off_s8/u8 - dotproduct of two images 16x16

4142

243

4142

4772

245

4774

dspi_dotprod_s8/u8- dotproduct of two images 64x64

58069

705

58068

58825

705

58826

dspi_dotprod_off_s8/u8 - dotproduct of two images 64x64

62365

1010

62366

71233

1177

71062

dspi_dotprod_s16/u16 - dotproduct of two images 8x8

1455

162

1453

1804

162

1806

dspi_dotprod_off_s16/u16 - dotproduct of two images 8x8

1529

196

1531

2074

195

2074

dspi_dotprod_s16 - dotproduct of two images 32x32

20190

565

20029

25300

566

25301

dspi_dotprod_off_s16/u16 - dotproduct of two images 32x32

21090

576

21089

29432

743

29432

The benchmark test could be reproduced by executing test cases found in /test/test_dsp.c.