Audio Recorder
The Audio Recorder API is a set of functions to facilitate voice recording. It combines two important functions, namely Audio Front End (AFE) and audio encoding. This allows users to customize AFE’s Voice Activity Detection (VAD), Automatic Gain Control (AGC), and Acoustic Echo Cancellation (AEC) settings. The encoding function is used by users to establish the encoding audio element, which supports various formats such as AAC, AMR-NB, AMR-WB, ADPCM, WAV, OPUS, and G711. The audio_rec_evt_t event makes it easy for users to interact with the Audio Recorder software.
The data path of the Audio recorder is presented in the diagram below.
The area represented by the parallelogram is configurable by the user according to their needs, such as sampling frequency, whether to encode, and encoding format.
Application Example
The speech_recognition/wwe/ example demonstrates how to initialize the speech recognition model, determine the number of samples and the sample rate of voice data to feed to the model, detect the wake-up word and command words, and encode voice to specific audio format.
API Reference
Header File
Functions
-
audio_rec_handle_t audio_recorder_create(audio_rec_cfg_t *cfg)
Initialize and start up audio recorder.
- Parameters
cfg – Configuration of audio recorder
- Returns
NULL failed Others audio recorder handle
-
esp_err_t audio_recorder_trigger_start(audio_rec_handle_t handle)
Start recording by force.
Note
If there need to read from recorder without wake word detected or read from recorder while the wake word detection is disabled, this interface can be use to force start the recorder process.
- Parameters
handle – Audio recorder handle
- Returns
ESP_OK ESP_FAIL
-
esp_err_t audio_recorder_trigger_stop(audio_rec_handle_t handle)
Stop recording by force.
Note
No matter the recorder process is triggered by wake word detected or triggered by
audio_recorder_trigger_start
, this function can be used to force stop the recorder. And if the VAD detection is disabeled, this must be invoked to stop recording afteraudio_recorder_trigger_start
.- Parameters
handle – Audio recorder handle
- Returns
ESP_OK ESP_FAIL
-
esp_err_t audio_recorder_wakenet_enable(audio_rec_handle_t handle, bool enable)
Enable or suspend wake word detection.
- Parameters
handle – Audio recorder handle
enable – true: enable wake word detection false: disable wake word detection
- Returns
ESP_OK ESP_FAIL
-
esp_err_t audio_recorder_multinet_enable(audio_rec_handle_t handle, bool enable)
Enable or suspend speech command recognition.
- Parameters
handle – Audio recorder handle
enable – true: enable speech command recognition false: disable speech command recognition
- Returns
ESP_OK ESP_FAIL
-
esp_err_t audio_recorder_vad_check_enable(audio_rec_handle_t handle, bool enable)
Enable or suspend voice duration check.
- Parameters
handle – Audio recorder handle
enable – true: enable voice duration check false: disable voice duration check
- Returns
ESP_OK ESP_FAIL
-
int audio_recorder_data_read(audio_rec_handle_t handle, void *buffer, int length, TickType_t ticks)
Read data from audio recorder.
- Parameters
handle – Audio recorder handle
buffer – Buffer to save data
length – Size of buffer
ticks – Timeout for reading
- Returns
Length of data actually read ESP_ERR_INVALID_ARG
-
esp_err_t audio_recorder_destroy(audio_rec_handle_t handle)
Destroy audio recorder and recycle all resource.
- Parameters
handle – Audio recorder handle
- Returns
ESP_OK ESP_FAIL
-
bool audio_recorder_get_wakeup_state(audio_rec_handle_t handle)
Get the wake up state of audio recorder.
- Parameters
handle – Audio recorder handle
- Returns
true false
Structures
-
struct audio_rec_evt_t
Recorder event.
Public Types
-
enum [anonymous]
Audio recorder event type.
Values:
-
enumerator AUDIO_REC_WAKEUP_START
Wakeup start
-
enumerator AUDIO_REC_WAKEUP_END
Wakeup stop
-
enumerator AUDIO_REC_VAD_START
Vad start
-
enumerator AUDIO_REC_VAD_END
Vad stop
-
enumerator AUDIO_REC_COMMAND_DECT
Form 0 is the id of the voice commands detected by Multinet
-
enumerator AUDIO_REC_WAKEUP_START
Public Members
-
enum audio_rec_evt_t::[anonymous] type
Audio recorder event type.
Event type
-
void *event_data
Event data: For
AUDIO_REC_WAKEUP_START
, event data isrecorder_sr_wakeup_result_t
ForAUDIO_REC_COMMAND_DECT
or higher, event data isrecorder_sr_mn_result_t
For other events, event data is NULL
-
size_t data_len
Length of event data
-
enum [anonymous]
-
struct audio_rec_cfg_t
Audio recorder configuration.
Public Members
-
int pinned_core
Audio recorder task pinned to core
-
int task_prio
Audio recorder task priority
-
int task_size
Audio recorder task stack size
-
rec_event_cb_t event_cb
Event callback function, event type as audio_rec_evt_t shown above
-
void *user_data
Pointer to user data (optional)
-
recorder_data_read_t read
Data callback function used feed data to audio recorder
-
void *sr_handle
SR handle
-
recorder_sr_iface_t *sr_iface
SR interface
-
int wakeup_time
Unit:ms. The duration that the wakeup state remains when VAD is not triggered
-
int vad_start
Unit:ms. Consecutive speech frame will be judged to vad start
-
int vad_off
Unit:ms. When the silence time exceeds this value, it is determined as AUDIO_REC_VAD_END state
-
int wakeup_end
Unit:ms. When the silence time after AUDIO_REC_VAD_END state exceeds this value, it is determined as AUDIO_REC_WAKEUP_END
-
void *encoder_handle
Encoder handle
-
recorder_encoder_iface_t *encoder_iface
Encoder interface
-
int pinned_core
Macros
-
AUDIO_REC_DEF_TASK_SZ
Stack size of recorder task
-
AUDIO_REC_DEF_TASK_PRIO
Priority of recoder task
-
AUDIO_REC_DEF_TASK_CORE
Pinned to core
-
AUDIO_REC_DEF_WAKEUP_TM
Default wake up time (ms)
-
AUDIO_REC_DEF_WAKEEND_TM
Duration after vad off (ms)
-
AUDIO_REC_VAD_START_SPEECH_MS
Consecutive speech frame will be judged to vad start (ms)
-
AUDIO_REC_DEF_VAD_OFF_TM
Default vad off time (ms)
-
AUDIO_RECORDER_DEFAULT_CFG()
Type Definitions
-
typedef esp_err_t (*rec_event_cb_t)(audio_rec_evt_t *event, void *user_data)
Event Notification.
-
typedef struct __audio_recorder *audio_rec_handle_t
Audio recorder handle.
Header File
Functions
-
recorder_sr_handle_t recorder_sr_create(recorder_sr_cfg_t *cfg, recorder_sr_iface_t **iface)
Initialize sr processor, and the sr is disabled as default.
- Parameters
cfg – Configuration of sr
iface – User interface provide by recorder sr
- Returns
NULL failed Others SR handle
-
esp_err_t recorder_sr_destroy(recorder_sr_handle_t handle)
Destroy SR processor and recycle all resource.
- Parameters
handle – SR processor handle
- Returns
ESP_OK ESP_FAIL
-
esp_err_t recorder_sr_reset_speech_cmd(recorder_sr_handle_t handle, char *command_str, char *err_phrase_id)
Reset the speech commands.
- Parameters
handle – SR processor handle
command_str – String of the commands. more details on
#2reset-api-on-the-fly
err_phrase_id – error string output
- Returns
ESP_OK ESP_FAIL
Structures
-
struct recorder_sr_cfg_t
SR processor configuration.
Note
Since the detection of command words requires a clear starting point, the moment the wake word is detected is taken as the default start of detection. Therefore, if the wake word detection is disabled, the detection will use the
vad_state
detected byesp-sr
as the start of detection. However, due to the fluctuation of thisvad_state
, the effectiveness of command word detection will be limited.Public Members
-
afe_config_t afe_cfg
Configuration of AFE
-
int8_t input_order[DAT_CH_MAX]
Channel order of the input data
-
bool multinet_init
Enable of speech command recognition
-
int feed_task_core
Core id of feed task
-
int feed_task_prio
Priority of feed task
-
int feed_task_stack
Stack size of feed task
-
int fetch_task_core
Core id of fetch task
-
int fetch_task_prio
Priority of fetch task
-
int fetch_task_stack
Stack size of fetch task
-
int rb_size
Ringbuffer size of recorder sr
-
char *partition_label
Partition label which stored the model data
-
char *mn_language
Command language for multinet to load
-
char *wn_wakeword
Wake Word for WakeNet to load. This is useful when multiple Wake Words are selected in sdkconfig. Setting this to NULL will use the first found model.
-
afe_config_t afe_cfg
Macros
-
FEED_TASK_STACK_SZ
-
FETCH_TASK_STACK_SZ
-
FEED_TASK_PRIO
-
FETCH_TASK_PRIO
-
FEED_TASK_PINNED_CORE
-
FETCH_TASK_PINNED_CORE
-
SR_OUTPUT_RB_SIZE
-
INPUT_ORDER_DEFAULT()
-
DEFAULT_RECORDER_SR_CFG()
Type Definitions
-
typedef void *recorder_sr_handle_t
SR processor handle.
Header File
Functions
-
recorder_encoder_handle_t recorder_encoder_create(recorder_encoder_cfg_t *cfg, recorder_encoder_iface_t **iface)
Initialize encoder processor, and the encoder is disabled as default.
- Parameters
cfg – Configuration of encoder
iface – User interface provide by recorder encoder
- Returns
NULL failed Others encoder handle
-
esp_err_t recorder_encoder_destroy(recorder_encoder_handle_t handle)
Destroy encoder processor and recycle all resource.
- Parameters
handle – Encoder processor handle
- Returns
ESP_OK ESP_FAIL
Structures
-
struct recorder_encoder_cfg_t
recorder encoder configuration parameters
Public Members
-
audio_element_handle_t resample
Handle of resample
-
audio_element_handle_t encoder
Handle of encoder
-
audio_element_handle_t resample
Type Definitions
-
typedef void *recorder_encoder_handle_t
encoder handle