Audio Recorder

The Audio Recorder API is a set of functions to facilitate voice recording. It combines two important functions, namely Audio Front End (AFE) and audio encoding. This allows users to customize AFE’s Voice Activity Detection (VAD), Automatic Gain Control (AGC), and Acoustic Echo Cancellation (AEC) settings. The encoding function is used by users to establish the encoding audio element, which supports various formats such as AAC, AMR-NB, AMR-WB, ADPCM, WAV, OPUS, and G711. The audio_rec_evt_t event makes it easy for users to interact with the Audio Recorder software.

The data path of the Audio recorder is presented in the diagram below.

Audio recorder data path

The area represented by the parallelogram is configurable by the user according to their needs, such as sampling frequency, whether to encode, and encoding format.

Application Example

The speech_recognition/wwe/ example demonstrates how to initialize the speech recognition model, determine the number of samples and the sample rate of voice data to feed to the model, detect the wake-up word and command words, and encode voice to specific audio format.

API Reference

Header File

Functions

audio_rec_handle_t audio_recorder_create(audio_rec_cfg_t *cfg)

Initialize and start up audio recorder.

Return

NULL failed Others audio recorder handle

Parameters
  • cfg: Configuration of audio recorder

esp_err_t audio_recorder_trigger_start(audio_rec_handle_t handle)

Start recording by force.

Note

If there need to read from recorder without wake word detected or read from recorder while the wake word detection is disabled, this interface can be use to force start the recorder process.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: Audio recorder handle

esp_err_t audio_recorder_trigger_stop(audio_rec_handle_t handle)

Stop recording by force.

Note

No matter the recorder process is triggered by wake word detected or triggered by audio_recorder_trigger_start, this function can be used to force stop the recorder. And if the VAD detection is disabeled, this must be invoked to stop recording after audio_recorder_trigger_start.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: Audio recorder handle

esp_err_t audio_recorder_wakenet_enable(audio_rec_handle_t handle, bool enable)

Enable or suspend wake word detection.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: Audio recorder handle

  • enable: true: enable wake word detection false: disable wake word detection

esp_err_t audio_recorder_multinet_enable(audio_rec_handle_t handle, bool enable)

Enable or suspend speech command recognition.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: Audio recorder handle

  • enable: true: enable speech command recognition false: disable speech command recognition

esp_err_t audio_recorder_vad_check_enable(audio_rec_handle_t handle, bool enable)

Enable or suspend voice duration check.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: Audio recorder handle

  • enable: true: enable voice duration check false: disable voice duration check

int audio_recorder_data_read(audio_rec_handle_t handle, void *buffer, int length, TickType_t ticks)

Read data from audio recorder.

Return

Length of data actually read ESP_ERR_INVALID_ARG

Parameters
  • handle: Audio recorder handle

  • buffer: Buffer to save data

  • length: Size of buffer

  • ticks: Timeout for reading

esp_err_t audio_recorder_destroy(audio_rec_handle_t handle)

Destroy audio recorder and recycle all resource.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: Audio recorder handle

bool audio_recorder_get_wakeup_state(audio_rec_handle_t handle)

Get the wake up state of audio recorder.

Return

true false

Parameters
  • handle: Audio recorder handle

Structures

struct audio_rec_evt_t

Recorder event.

Public Types

enum [anonymous]

Audio recorder event type.

Values:

AUDIO_REC_WAKEUP_START = -100

Wakeup start

AUDIO_REC_WAKEUP_END

Wakeup stop

AUDIO_REC_VAD_START

Vad start

AUDIO_REC_VAD_END

Vad stop

AUDIO_REC_COMMAND_DECT = 0

Form 0 is the id of the voice commands detected by Multinet

Public Members

audio_rec_evt_t::[anonymous] type

Audio recorder event type.

Event type

void *event_data

Event data: For AUDIO_REC_WAKEUP_START, event data is recorder_sr_wakeup_result_t For AUDIO_REC_COMMAND_DECT or higher, event data is recorder_sr_mn_result_t For other events, event data is NULL

size_t data_len

Length of event data

struct audio_rec_cfg_t

Audio recorder configuration.

Public Members

int pinned_core

Audio recorder task pinned to core

int task_prio

Audio recorder task priority

int task_size

Audio recorder task stack size

rec_event_cb_t event_cb

Event callback function, event type as audio_rec_evt_t shown above

void *user_data

Pointer to user data (optional)

recorder_data_read_t read

Data callback function used feed data to audio recorder

void *sr_handle

SR handle

recorder_sr_iface_t *sr_iface

SR interface

int wakeup_time

Unit:ms. The duration that the wakeup state remains when VAD is not triggered

int vad_start

Unit:ms. Consecutive speech frame will be judged to vad start

int vad_off

Unit:ms. When the silence time exceeds this value, it is determined as AUDIO_REC_VAD_END state

int wakeup_end

Unit:ms. When the silence time after AUDIO_REC_VAD_END state exceeds this value, it is determined as AUDIO_REC_WAKEUP_END

void *encoder_handle

Encoder handle

recorder_encoder_iface_t *encoder_iface

Encoder interface

Macros

AUDIO_REC_DEF_TASK_SZ

Stack size of recorder task

AUDIO_REC_DEF_TASK_PRIO

Priority of recoder task

AUDIO_REC_DEF_TASK_CORE

Pinned to core

AUDIO_REC_DEF_WAKEUP_TM

Default wake up time (ms)

AUDIO_REC_DEF_WAKEEND_TM

Duration after vad off (ms)

AUDIO_REC_VAD_START_SPEECH_MS

Consecutive speech frame will be judged to vad start (ms)

AUDIO_REC_DEF_VAD_OFF_TM

Default vad off time (ms)

AUDIO_RECORDER_DEFAULT_CFG()

Type Definitions

typedef esp_err_t (*rec_event_cb_t)(audio_rec_evt_t *event, void *user_data)

Event Notification.

typedef struct __audio_recorder *audio_rec_handle_t

Audio recorder handle.

Header File

Functions

recorder_sr_handle_t recorder_sr_create(recorder_sr_cfg_t *cfg, recorder_sr_iface_t **iface)

Initialize sr processor, and the sr is disabled as default.

Return

NULL failed Others SR handle

Parameters
  • cfg: Configuration of sr

  • iface: User interface provide by recorder sr

esp_err_t recorder_sr_destroy(recorder_sr_handle_t handle)

Destroy SR processor and recycle all resource.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: SR processor handle

esp_err_t recorder_sr_reset_speech_cmd(recorder_sr_handle_t handle, char *command_str, char *err_phrase_id)

Reset the speech commands.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: SR processor handle

  • command_str: String of the commands. more details on #2reset-api-on-the-fly

  • err_phrase_id: error string output

Structures

struct recorder_sr_cfg_t

SR processor configuration.

Note

Since the detection of command words requires a clear starting point, the moment the wake word is detected is taken as the default start of detection. Therefore, if the wake word detection is disabled, the detection will use the vad_state detected by esp-sr as the start of detection. However, due to the fluctuation of this vad_state, the effectiveness of command word detection will be limited.

Public Members

afe_config_t afe_cfg

Configuration of AFE

int8_t input_order[DAT_CH_MAX]

Channel order of the input data

bool multinet_init

Enable of speech command recognition

int feed_task_core

Core id of feed task

int feed_task_prio

Priority of feed task

int feed_task_stack

Stack size of feed task

int fetch_task_core

Core id of fetch task

int fetch_task_prio

Priority of fetch task

int fetch_task_stack

Stack size of fetch task

int rb_size

Ringbuffer size of recorder sr

char *partition_label

Partition label which stored the model data

char *mn_language

Command language for multinet to load

char *wn_wakeword

Wake Word for WakeNet to load. This is useful when multiple Wake Words are selected in sdkconfig. Setting this to NULL will use the first found model.

Macros

FEED_TASK_STACK_SZ
FETCH_TASK_STACK_SZ
FEED_TASK_PRIO
FETCH_TASK_PRIO
FEED_TASK_PINNED_CORE
FETCH_TASK_PINNED_CORE
SR_OUTPUT_RB_SIZE
INPUT_ORDER_DEFAULT()
DEFAULT_RECORDER_SR_CFG()

Type Definitions

typedef void *recorder_sr_handle_t

SR processor handle.

Header File

Functions

recorder_encoder_handle_t recorder_encoder_create(recorder_encoder_cfg_t *cfg, recorder_encoder_iface_t **iface)

Initialize encoder processor, and the encoder is disabled as default.

Return

NULL failed Others encoder handle

Parameters
  • cfg: Configuration of encoder

  • iface: User interface provide by recorder encoder

esp_err_t recorder_encoder_destroy(recorder_encoder_handle_t handle)

Destroy encoder processor and recycle all resource.

Return

ESP_OK ESP_FAIL

Parameters
  • handle: Encoder processor handle

Structures

struct recorder_encoder_cfg_t

recorder encoder configuration parameters

Public Members

audio_element_handle_t resample

Handle of resample

audio_element_handle_t encoder

Handle of encoder

Type Definitions

typedef void *recorder_encoder_handle_t

encoder handle