Espressif DSP Library Benchmarks

The table bellow contains benchmarks of functions provided by ESP-DSP library. The values are CPU cycle counts taken to execute each of the functions. The Values in the column “O2” are made with compiler optimization for speed, and in the column “Os” column are made with compiler optimization for size. The values in “ESP32”, “ESP32S3” and “ESP32P4” columns are for the optimized (assembly) implementation, values in “ANSI” column are for the non-optimized implementation.

Function

Optimization

ESP32

O2

ESP32S3

O2

ESP32P4

O2

ANSI Xtensa

O2

ANSI Risc-V

O2

ESP32

Os

ESP32S3

Os

ESP32P4

Os

ANSI Xtensa

Os

ANSI Risc-V

Os

Dot Product

dsps_dotprod_f32 for N=256 points

1058

449

1319

1325

1563

1058

448

1320

4129

2336

dsps_dotprode_f32 for N=256 points with step 1

1317

1326

1314

2853

1819

1317

1326

1317

3621

2087

dsps_dotprod_s16 for N=256 points

447

323

208

3647

3385

447

322

202

6466

4170

FIR Filters

dsps_fir_f32 1024 input samples and 256 coefficients

1079312

443690

1606563

2150685

1606552

1079599

443688

3050985

5147785

3050972

dsps_fird_f32 1024 samples 256 coeffs and decimation 4

350915

115494

339796

614234

415332

350520

115495

339780

1317367

814168

FFTs Radix-2 32 bit Floating Point

dsps_fft2r_fc32 for 64 complex points

6079

5142

4989

7037

6062

5452

5140

4988

8333

8144

dsps_fft2r_fc32 for 128 complex points

13031

11706

11404

15907

13725

12399

11705

11391

19035

18622

dsps_fft2r_fc32 for 256 complex points

27828

26303

25684

35562

30779

27828

26303

25671

42922

41955

dsps_fft2r_fc32 for 512 complex points

61753

58435

57181

78705

68325

61753

58437

57167

95673

93483

dsps_fft2r_fc32 for 1024 complex points

135742

128697

126053

172664

140083

135742

128585

126039

211252

186812

FFTs Radix-4 32 bit Floating Point

dsps_fft4r_fc32 for 64 complex points

3125

3345

3483

5185

3761

3247

3176

3480

5631

3876

dsps_fft4r_fc32 for 256 complex points

15551

15522

18052

26115

19266

16056

15792

18036

28397

19788

dsps_fft4r_fc32 for 1024 complex points

75547

75273

89820

127669

95550

77587

76554

89766

138522

98026

FFTs 16 bit Fixed Point

dsps_fft2r_sc16 for 64 complex points

8786

793

897

14575

9055

8786

794

894

15861

9472

dsps_fft2r_sc16 for 128 complex points

20214

1828

1832

33121

20675

20214

1627

1835

36238

21633

dsps_fft2r_sc16 for 256 complex points

45755

3429

3873

74290

46576

45755

3428

3876

81638

48735

dsps_fft2r_sc16 for 512 complex points

102208

7512

8234

164803

103739

102416

7311

8237

181758

108555

dsps_fft2r_sc16 for 1024 complex points

225862

15641

17523

362193

228808

225861

15642

17526

400853

239447

IIR Filters

dsps_biquad_f32 - biquad filter for 1024 input samples

17450

17459

15391

24613

21544

17451

17630

15403

36895

32789

Matrix Multiplication

dspm_mult_f32 - C[16;16] = A[16;16]*B[16;16]

24669

6297

28276

51502

31481

24670

6498

28239

78197

38913

dspm_mult_s16 - C[16;16] = A[16;16]*B[16;16]

24707

1848

2138

83699

60715

24707

1848

2141

99353

63865

dspm_mult_3x3x1_f32 - C[3;1] = A[3;3]*B[3;1]

79

275

144

226

161

80

86

159

271

164

dspm_mult_3x3x3_f32 - C[3;3] = A[3;3]*B[3;3]

211

323

338

492

309

210

217

332

611

351

dspm_mult_4x4x1_f32 - C[4;1] = A[4;4]*B[4;1]

112

119

206

334

227

113

121

202

425

258

dspm_mult_4x4x4_f32 - C[4;4] = A[4;4]*B[4;4]

405

192

658

1008

608

404

194

658

1335

749

Image processing prototypes

dspi_dotprod_s8/u8 - dotproduct of two images 16x16

3827

381

225

3828

2436

4010

179

228

4011

2373

dspi_dotprod_off_s8/u8 - dotproduct of two images 16x16

4142

243

240

4142

2736

4772

245

243

4774

2584

dspi_dotprod_s8/u8- dotproduct of two images 64x64

58069

705

1662

58068

33723

58825

705

1664

58826

33800

dspi_dotprod_off_s8/u8 - dotproduct of two images 64x64

62365

1010

1792

62366

37884

71233

1177

1795

71062

37768

dspi_dotprod_s16/u16 - dotproduct of two images 8x8

1455

162

153

1453

994

1804

162

152

1806

1018

dspi_dotprod_off_s16/u16 - dotproduct of two images 8x8

1529

196

145

1531

1056

2074

195

143

2074

1088

dspi_dotprod_s16 - dotproduct of two images 32x32

20190

565

853

20029

13888

25300

566

859

25301

13831

dspi_dotprod_off_s16/u16 - dotproduct of two images 32x32

21090

576

921

21089

14874

29432

743

921

29432

14850

The benchmark test could be reproduced by executing test cases found in /test/test_dsp.c.