Audio Recorder

The Audio Recorder API is a set of functions to facilitate voice recording. It combines two important functions, namely Audio Front End (AFE) and audio encoding. This allows users to customize AFE’s Voice Activity Detection (VAD), Automatic Gain Control (AGC), and Acoustic Echo Cancellation (AEC) settings. The encoding function is used by users to establish the encoding audio element, which supports various formats such as AAC, AMR-NB, AMR-WB, ADPCM, WAV, OPUS, and G711. The audio_rec_evt_t event makes it easy for users to interact with the Audio Recorder software.

The data path of the Audio recorder is presented in the diagram below.

The area represented by the parallelogram is configurable by the user according to their needs, such as sampling frequency, whether to encode, and encoding format.

Application Example

The speech_recognition/wwe/ example demonstrates how to initialize the speech recognition model, determine the number of samples and the sample rate of voice data to feed to the model, detect the wake-up word and command words, and encode voice to specific audio format.

API Reference

Header File

components/audio_recorder/include/audio_recorder.h

Functions

audio_rec_handle_t audio_recorder_create(audio_rec_cfg_t *cfg)

Initialize and start up audio recorder.

Parameters: cfg – Configuration of audio recorder
Returns: NULL failed Others audio recorder handle

esp_err_t audio_recorder_trigger_start(audio_rec_handle_t handle)

Start recording by force.

Note

If there need to read from recorder without wake word detected or read from recorder while the wake word detection is disabled, this interface can be use to force start the recorder process.

Parameters: handle – Audio recorder handle
Returns: ESP_OK ESP_FAIL

esp_err_t audio_recorder_trigger_stop(audio_rec_handle_t handle)

Stop recording by force.

Note

No matter the recorder process is triggered by wake word detected or triggered by audio_recorder_trigger_start, this function can be used to force stop the recorder. And if the VAD detection is disabeled, this must be invoked to stop recording after audio_recorder_trigger_start.

Parameters: handle – Audio recorder handle
Returns: ESP_OK ESP_FAIL

esp_err_t audio_recorder_wakenet_enable(audio_rec_handle_t handle, bool enable)

Enable or suspend wake word detection.

Parameters

handle – Audio recorder handle
enable – true: enable wake word detection false: disable wake word detection

Returns

ESP_OK ESP_FAIL

esp_err_t audio_recorder_multinet_enable(audio_rec_handle_t handle, bool enable)

Enable or suspend speech command recognition.

Parameters

handle – Audio recorder handle
enable – true: enable speech command recognition false: disable speech command recognition

Returns

ESP_OK ESP_FAIL

esp_err_t audio_recorder_vad_check_enable(audio_rec_handle_t handle, bool enable)

Enable or suspend voice duration check.

Parameters

handle – Audio recorder handle
enable – true: enable voice duration check false: disable voice duration check

Returns

ESP_OK ESP_FAIL

int audio_recorder_data_read(audio_rec_handle_t handle, void *buffer, int length, TickType_t ticks)

Read data from audio recorder.

Parameters

handle – Audio recorder handle
buffer – Buffer to save data
length – Size of buffer
ticks – Timeout for reading

Returns

Length of data actually read ESP_ERR_INVALID_ARG

esp_err_t audio_recorder_destroy(audio_rec_handle_t handle)

Destroy audio recorder and recycle all resource.

Parameters: handle – Audio recorder handle
Returns: ESP_OK ESP_FAIL

bool audio_recorder_get_wakeup_state(audio_rec_handle_t handle)

Get the wake up state of audio recorder.

Parameters: handle – Audio recorder handle
Returns: true false

Structures

struct audio_rec_evt_t

Recorder event.

Public Types

enum audio_recorder_event_type_t

Audio recorder event type.

Values:

enumerator AUDIO_REC_WAKEUP_START: Wakeup start

enumerator AUDIO_REC_WAKEUP_END: Wakeup stop

enumerator AUDIO_REC_VAD_START: Vad start

enumerator AUDIO_REC_VAD_END: Vad stop

enumerator AUDIO_REC_COMMAND_DECT: Form 0 is the id of the voice commands detected by Multinet

Public Members

enum audio_rec_evt_t::audio_recorder_event_type_t type: Event type

void *event_data: Event data: For AUDIO_REC_WAKEUP_START, event data is recorder_sr_wakeup_result_t For AUDIO_REC_COMMAND_DECT or higher, event data is recorder_sr_mn_result_t For other events, event data is NULL

size_t data_len: Length of event data

struct audio_rec_cfg_t

Audio recorder configuration.

Public Members

int pinned_core: Audio recorder task pinned to core

int task_prio: Audio recorder task priority

int task_size: Audio recorder task stack size

rec_event_cb_t event_cb: Event callback function, event type as audio_rec_evt_t shown above

void *user_data: Pointer to user data (optional)

recorder_data_read_t read: Data callback function used feed data to audio recorder

void *sr_handle: SR handle

recorder_sr_iface_t *sr_iface: SR interface

int wakeup_time: Unit:ms. The duration that the wakeup state remains when VAD is not triggered

int vad_start: Unit:ms. Consecutive speech frame will be judged to vad start

int vad_off: Unit:ms. When the silence time exceeds this value, it is determined as AUDIO_REC_VAD_END state

int wakeup_end: Unit:ms. When the silence time after AUDIO_REC_VAD_END state exceeds this value, it is determined as AUDIO_REC_WAKEUP_END

void *encoder_handle: Encoder handle

recorder_encoder_iface_t *encoder_iface: Encoder interface

Macros

AUDIO_REC_DEF_TASK_SZ: Stack size of recorder task

AUDIO_REC_DEF_TASK_PRIO: Priority of recoder task

AUDIO_REC_DEF_TASK_CORE: Pinned to core

AUDIO_REC_DEF_WAKEUP_TM: Default wake up time (ms)

AUDIO_REC_DEF_WAKEEND_TM: Duration after vad off (ms)

AUDIO_REC_VAD_START_SPEECH_MS: Consecutive speech frame will be judged to vad start (ms)

AUDIO_REC_DEF_VAD_OFF_TM: Default vad off time (ms)

AUDIO_RECORDER_DEFAULT_CFG()

Type Definitions

typedef esp_err_t (*rec_event_cb_t)(audio_rec_evt_t *event, void *user_data): Event Notification.

typedef struct __audio_recorder *audio_rec_handle_t: Audio recorder handle.

Header File

components/audio_recorder/include/recorder_sr.h

Functions

recorder_sr_handle_t recorder_sr_create(recorder_sr_cfg_t *cfg, recorder_sr_iface_t **iface)

Initialize sr processor, and the sr is disabled as default.

Parameters

cfg – Configuration of sr
iface – User interface provide by recorder sr

Returns

NULL failed Others SR handle

esp_err_t recorder_sr_destroy(recorder_sr_handle_t handle)

Destroy SR processor and recycle all resource.

Parameters: handle – SR processor handle
Returns: ESP_OK ESP_FAIL

esp_err_t recorder_sr_reset_speech_cmd(recorder_sr_handle_t handle, char *command_str, char *err_phrase_id)

Reset the speech commands.

Parameters

handle – SR processor handle
command_str – String of the commands. more details on #2reset-api-on-the-fly
err_phrase_id – error string output

Returns

ESP_OK ESP_FAIL

Structures

struct recorder_sr_cfg_t

SR processor configuration.

Note

Since the detection of command words requires a clear starting point, the moment the wake word is detected is taken as the default start of detection. Therefore, if the wake word detection is disabled, the detection will use the vad_state detected by esp-sr as the start of detection. However, due to the fluctuation of this vad_state, the effectiveness of command word detection will be limited.

Public Members

afe_config_t *afe_cfg: Configuration of AFE

bool multinet_init: Enable of speech command recognition

int feed_task_core: Core id of feed task

int feed_task_prio: Priority of feed task

int feed_task_stack: Stack size of feed task

int fetch_task_core: Core id of fetch task

int fetch_task_prio: Priority of fetch task

int fetch_task_stack: Stack size of fetch task

int rb_size: Ringbuffer size of recorder sr

char *partition_label: Partition label which stored the model data

char *mn_language: Command language for multinet to load

char *wn_wakeword: Wake Word for WakeNet to load. This is useful when multiple Wake Words are selected in sdkconfig. Setting this to NULL will use the first found model.

Macros

FEED_TASK_STACK_SZ

FETCH_TASK_STACK_SZ

FEED_TASK_PRIO

FETCH_TASK_PRIO

FEED_TASK_PINNED_CORE

FETCH_TASK_PINNED_CORE

SR_OUTPUT_RB_SIZE

DEFAULT_RECORDER_SR_CFG(fmt, partition, sr_type, afe_mode)

Type Definitions

typedef void *recorder_sr_handle_t: SR processor handle.

Header File

components/audio_recorder/include/recorder_encoder.h

Functions

recorder_encoder_handle_t recorder_encoder_create(recorder_encoder_cfg_t *cfg, recorder_encoder_iface_t **iface)

Initialize encoder processor, and the encoder is disabled as default.

Parameters

cfg – Configuration of encoder
iface – User interface provide by recorder encoder

Returns

NULL failed Others encoder handle

esp_err_t recorder_encoder_destroy(recorder_encoder_handle_t handle)

Destroy encoder processor and recycle all resource.

Parameters: handle – Encoder processor handle
Returns: ESP_OK ESP_FAIL

Structures

struct recorder_encoder_cfg_t

recorder encoder configuration parameters

Public Members

audio_element_handle_t resample: Handle of resample

audio_element_handle_t encoder: Handle of encoder

Type Definitions

typedef void *recorder_encoder_handle_t: encoder handle

Provide feedback about this document