Audio Recorder¶
The Audio Recorder API is a set of functions to facilitate voice recording. It combines two important functions, namely Audio Front End (AFE) and audio encoding. This allows users to customize AFE’s Voice Activity Detection (VAD), Automatic Gain Control (AGC), and Acoustic Echo Cancellation (AEC) settings. The encoding function is used by users to establish the encoding audio element, which supports various formats such as AAC, AMR-NB, AMR-WB, ADPCM, WAV, OPUS, and G711. The audio_rec_evt_t event makes it easy for users to interact with the Audio Recorder software.
The data path of the Audio recorder is presented in the diagram below.
The area represented by the parallelogram is configurable by the user according to their needs, such as sampling frequency, whether to encode, and encoding format.
Application Example¶
The speech_recognition/wwe/ example demonstrates how to initialize the speech recognition model, determine the number of samples and the sample rate of voice data to feed to the model, detect the wake-up word and command words, and encode voice to specific audio format.
API Reference¶
Header File¶
Functions¶
-
audio_rec_handle_t
audio_recorder_create
(audio_rec_cfg_t *cfg)¶ Initialize and start up audio recorder.
- Return
NULL failed Others audio recorder handle
- Parameters
cfg
: Configuration of audio recorder
-
esp_err_t
audio_recorder_trigger_start
(audio_rec_handle_t handle)¶ Start recording by force.
- Note
If there need to read from recorder without wake word detected or read from recorder while the wake word detection is disabled, this interface can be use to force start the recorder process.
- Return
ESP_OK ESP_FAIL
- Parameters
handle
: Audio recorder handle
-
esp_err_t
audio_recorder_trigger_stop
(audio_rec_handle_t handle)¶ Stop recording by force.
- Note
No matter the recorder process is triggered by wake word detected or triggered by
audio_recorder_trigger_start
, this function can be used to force stop the recorder. And if the VAD detection is disabeled, this must be invoked to stop recording afteraudio_recorder_trigger_start
.- Return
ESP_OK ESP_FAIL
- Parameters
handle
: Audio recorder handle
-
esp_err_t
audio_recorder_wakenet_enable
(audio_rec_handle_t handle, bool enable)¶ Enable or suspend wake word detection.
- Return
ESP_OK ESP_FAIL
- Parameters
handle
: Audio recorder handleenable
: true: enable wake word detection false: disable wake word detection
-
esp_err_t
audio_recorder_multinet_enable
(audio_rec_handle_t handle, bool enable)¶ Enable or suspend speech command recognition.
- Return
ESP_OK ESP_FAIL
- Parameters
handle
: Audio recorder handleenable
: true: enable speech command recognition false: disable speech command recognition
-
esp_err_t
audio_recorder_vad_check_enable
(audio_rec_handle_t handle, bool enable)¶ Enable or suspend voice duration check.
- Return
ESP_OK ESP_FAIL
- Parameters
handle
: Audio recorder handleenable
: true: enable voice duration check false: disable voice duration check
-
int
audio_recorder_data_read
(audio_rec_handle_t handle, void *buffer, int length, TickType_t ticks)¶ Read data from audio recorder.
- Return
Length of data actually read ESP_ERR_INVALID_ARG
- Parameters
handle
: Audio recorder handlebuffer
: Buffer to save datalength
: Size of bufferticks
: Timeout for reading
-
esp_err_t
audio_recorder_destroy
(audio_rec_handle_t handle)¶ Destroy audio recorder and recycle all resource.
- Return
ESP_OK ESP_FAIL
- Parameters
handle
: Audio recorder handle
-
bool
audio_recorder_get_wakeup_state
(audio_rec_handle_t handle)¶ Get the wake up state of audio recorder.
- Return
true false
- Parameters
handle
: Audio recorder handle
Structures¶
-
struct
audio_rec_evt_t
¶ Recorder event.
Public Types
Public Members
-
audio_rec_evt_t::[anonymous]
type
¶ Audio recorder event type.
Event type
-
void *
event_data
¶ Event data: For
AUDIO_REC_WAKEUP_START
, event data isrecorder_sr_wakeup_result_t
ForAUDIO_REC_COMMAND_DECT
or higher, event data isrecorder_sr_mn_result_t
For other events, event data is NULL
-
size_t
data_len
¶ Length of event data
-
audio_rec_evt_t::[anonymous]
-
struct
audio_rec_cfg_t
¶ Audio recorder configuration.
Public Members
-
int
pinned_core
¶ Audio recorder task pinned to core
-
int
task_prio
¶ Audio recorder task priority
-
int
task_size
¶ Audio recorder task stack size
-
rec_event_cb_t
event_cb
¶ Event callback function, event type as audio_rec_evt_t shown above
-
void *
user_data
¶ Pointer to user data (optional)
-
recorder_data_read_t
read
¶ Data callback function used feed data to audio recorder
-
void *
sr_handle
¶ SR handle
-
recorder_sr_iface_t *
sr_iface
¶ SR interface
-
int
wakeup_time
¶ Unit:ms. The duration that the wakeup state remains when VAD is not triggered
-
int
vad_start
¶ Unit:ms. Consecutive speech frame will be judged to vad start
-
int
vad_off
¶ Unit:ms. When the silence time exceeds this value, it is determined as AUDIO_REC_VAD_END state
-
int
wakeup_end
¶ Unit:ms. When the silence time after AUDIO_REC_VAD_END state exceeds this value, it is determined as AUDIO_REC_WAKEUP_END
-
void *
encoder_handle
¶ Encoder handle
-
recorder_encoder_iface_t *
encoder_iface
¶ Encoder interface
-
int
Macros¶
-
AUDIO_REC_DEF_TASK_SZ
¶ Stack size of recorder task
-
AUDIO_REC_DEF_TASK_PRIO
¶ Priority of recoder task
-
AUDIO_REC_DEF_TASK_CORE
¶ Pinned to core
-
AUDIO_REC_DEF_WAKEUP_TM
¶ Default wake up time (ms)
-
AUDIO_REC_DEF_WAKEEND_TM
¶ Duration after vad off (ms)
-
AUDIO_REC_VAD_START_SPEECH_MS
¶ Consecutive speech frame will be judged to vad start (ms)
-
AUDIO_REC_DEF_VAD_OFF_TM
¶ Default vad off time (ms)
-
AUDIO_RECORDER_DEFAULT_CFG
()¶
Type Definitions¶
-
typedef esp_err_t (*
rec_event_cb_t
)(audio_rec_evt_t *event, void *user_data)¶ Event Notification.
-
typedef struct __audio_recorder *
audio_rec_handle_t
¶ Audio recorder handle.
Header File¶
Functions¶
-
recorder_sr_handle_t
recorder_sr_create
(recorder_sr_cfg_t *cfg, recorder_sr_iface_t **iface)¶ Initialize sr processor, and the sr is disabled as default.
- Return
NULL failed Others SR handle
- Parameters
cfg
: Configuration of sriface
: User interface provide by recorder sr
-
esp_err_t
recorder_sr_destroy
(recorder_sr_handle_t handle)¶ Destroy SR processor and recycle all resource.
- Return
ESP_OK ESP_FAIL
- Parameters
handle
: SR processor handle
-
esp_err_t
recorder_sr_reset_speech_cmd
(recorder_sr_handle_t handle, char *command_str, char *err_phrase_id)¶ Reset the speech commands.
- Return
ESP_OK ESP_FAIL
- Parameters
handle
: SR processor handlecommand_str
: String of the commands. more details on#2reset-api-on-the-fly
err_phrase_id
: error string output
Structures¶
-
struct
recorder_sr_cfg_t
¶ SR processor configuration.
- Note
Since the detection of command words requires a clear starting point, the moment the wake word is detected is taken as the default start of detection. Therefore, if the wake word detection is disabled, the detection will use the
vad_state
detected byesp-sr
as the start of detection. However, due to the fluctuation of thisvad_state
, the effectiveness of command word detection will be limited.
Public Members
-
afe_config_t
afe_cfg
¶ Configuration of AFE
-
int8_t
input_order
[DAT_CH_MAX
]¶ Channel order of the input data
-
bool
multinet_init
¶ Enable of speech command recognition
-
int
feed_task_core
¶ Core id of feed task
-
int
feed_task_prio
¶ Priority of feed task
-
int
feed_task_stack
¶ Stack size of feed task
-
int
fetch_task_core
¶ Core id of fetch task
-
int
fetch_task_prio
¶ Priority of fetch task
-
int
fetch_task_stack
¶ Stack size of fetch task
-
int
rb_size
¶ Ringbuffer size of recorder sr
-
char *
partition_label
¶ Partition label which stored the model data
-
char *
mn_language
¶ Command language for multinet to load
-
char *
wn_wakeword
¶ Wake Word for WakeNet to load. This is useful when multiple Wake Words are selected in sdkconfig. Setting this to NULL will use the first found model.
Macros¶
-
FEED_TASK_STACK_SZ
¶
-
FETCH_TASK_STACK_SZ
¶
-
FEED_TASK_PRIO
¶
-
FETCH_TASK_PRIO
¶
-
FEED_TASK_PINNED_CORE
¶
-
FETCH_TASK_PINNED_CORE
¶
-
SR_OUTPUT_RB_SIZE
¶
-
INPUT_ORDER_DEFAULT
()¶
-
DEFAULT_RECORDER_SR_CFG
()¶
Type Definitions¶
-
typedef void *
recorder_sr_handle_t
¶ SR processor handle.
Header File¶
Functions¶
-
recorder_encoder_handle_t
recorder_encoder_create
(recorder_encoder_cfg_t *cfg, recorder_encoder_iface_t **iface)¶ Initialize encoder processor, and the encoder is disabled as default.
- Return
NULL failed Others encoder handle
- Parameters
cfg
: Configuration of encoderiface
: User interface provide by recorder encoder
-
esp_err_t
recorder_encoder_destroy
(recorder_encoder_handle_t handle)¶ Destroy encoder processor and recycle all resource.
- Return
ESP_OK ESP_FAIL
- Parameters
handle
: Encoder processor handle
Structures¶
-
struct
recorder_encoder_cfg_t
¶ recorder encoder configuration parameters
Public Members
-
audio_element_handle_t
resample
¶ Handle of resample
-
audio_element_handle_t
encoder
¶ Handle of encoder
-
audio_element_handle_t
Type Definitions¶
-
typedef void *
recorder_encoder_handle_t
¶ encoder handle