Voice Activity Detection (VAD)

[中文]

Introduction

Voice Activity Detection (VAD) module facilitates the hardware implementation of the first-stage algorithm for voice wake-up and other multimedia functions.

Additionally, it provides hardware support for low-power voice wake-up solutions.

For LP I2S documentation, see Low Power Inter-IC Sound.

Hardware State Machine

LP VAD driver provides a structure lp_vad_config_t to configure the LP VAD module:

Above configurations can change the VAD state machine shown below:

                                           ┌──────────────────────────────────┐
                                           │                                  │
                             ┌─────────────┤  speak-activity-listening-state  │ ◄───────────────┐
                             │             │                                  │                 │
                             │             └──────────────────────────────────┘                 │
                             │                          ▲                                       │
                             │                          │                                       │
                             │                          │                                       │
                             │                          │                                       │
                             │                          │                                       │
detected speak activity      │                          │  detected speak activity              │   detected speak activity
        >=                   │                          │          >=                           │           >=
'speak_activity_thresh'      │                          │  'min_speak_activity_thresh'          │   'max_speak_activity_thresh'
                             │                          │                                       │
                             │                          │          &&                           │
                             │                          │                                       │
                             │                          │  detected non-speak activity          │
                             │                          │           <                           │
                             │                          │  'non_speak_activity_thresh'          │
                             │                          │                                       │
                             │                          │                                       │
                             │                          │                                       │
                             │                          │                                       │
                             │                          │                                       │
                             │              ┌───────────┴─────────────────────┐                 │
                             │              │                                 │                 │
                             └───────────►  │ speak-activity-detected-state   ├─────────────────┘
                                            │                                 │
                                            └─┬───────────────────────────────┘
                                              │
                                              │                     ▲
                                              │                     │
                                              │                     │
                                              │                     │  detected speak activity
                                              │                     │          >=
                                              │                     │  'min_speak_activity_thresh'
                                              │                     │
                                              │                     │          &&
                                              │                     │
                                              │                     │  detected non-speak activity
                                              │                     │           <
                                              └─────────────────────┘  'non_speak_activity_thresh'

HP Driver Functional Overview

The VAD HP driver is used for configure the LP VAD to be working under the control of the HP core. The HP core can also be woken up by the VAD when voice activity is detected.

Resource Allocation

lp_vad_init_config_t is the configuration structure that is needed to create a LP I2S VAD unit handle. To create a LP I2S VAD unit handle, you will need to first create a LP I2S channel handle. see Low Power Inter-IC Sound.

You can call lp_i2s_vad_new_unit() to create the handle. If the VAD unit is no longer used, you should recycle the allocated resource by calling lp_i2s_vad_del_unit().

vad_unit_handle_t vad_handle = NULL;
lp_vad_init_config_t init_config = {
.lp_i2s_chan = rx_handle,
.vad_config = {
    .init_frame_num = 100,
    .min_energy_thresh = 100,
    .speak_activity_thresh = 10,
    .non_speak_activity_thresh = 30,
    .min_speak_activity_thresh = 3,
    .max_speak_activity_thresh = 100,
    },
};
ESP_ERROR_CHECK(lp_i2s_vad_new_unit(vad_id, init_config, &vad_handle));

ESP_ERROR_CHECK(lp_i2s_vad_del_unit(vad_handle));

Enable and Disable the VAD

Before using a VAD unit to detect voice activity, you need to enable the VAD unit by calling lp_i2s_vad_enable(), this function switches the driver state from init to enable, and also enables the VAD hardware. Calling lp_i2s_vad_disable() does the opposite, that is, put the driver back to the init state, the hardware will stop as well.

HP Core Wake-up

esp_sleep_enable_vad_wakeup() can help you to set the VAD to be working as the HP core wake-up source. To make VAD work during sleep, you should let the system maintain the RTC domain and XTAL power. See code example below:

ESP_ERROR_CHECK(esp_sleep_enable_vad_wakeup());

LP Driver Functional Overview

The VAD LP driver is mainly for LP core wake-up. The VAD can be configured under HP core control, then it can wakeup the LP core when voice activities are detected.

Resource Allocation

lp_core_lp_vad_cfg_t and lp_core_lp_vad_init() are used to initialize the VAD LP driver.

lp_core_lp_vad_deinit() is used to recycle the allocated resources.

Enable and Disable the VAD

lp_core_lp_vad_enable() and lp_core_lp_vad_disable() are used for enabling / disabling the hardware.

LP Core Wake-up

Set ULP_LP_CORE_WAKEUP_SOURCE_LP_VAD in ulp_lp_core_cfg_t to enable the VAD to be working as the LP core wake-up source.

static void load_and_start_lp_core_firmware(ulp_lp_core_cfg_t* cfg, const uint8_t* firmware_start, const uint8_t* firmware_end)
{
    TEST_ASSERT(ulp_lp_core_load_binary(firmware_start,
                                        (firmware_end - firmware_start)) == ESP_OK);

    TEST_ASSERT(ulp_lp_core_run(cfg) == ESP_OK);
}

ulp_lp_core_cfg_t cfg = {
    .wakeup_source = ULP_LP_CORE_WAKEUP_SOURCE_LP_VAD,
};
load_and_start_lp_core_firmware(&cfg, lp_core_main_vad_bin_start, lp_core_main_vad_bin_end);

API Reference

Header File

  • components/esp_driver_i2s/include/driver/lp_i2s_vad.h

  • This header file can be included with:

    #include "driver/lp_i2s_vad.h"
    
  • This header file is a part of the API provided by the esp_driver_i2s component. To declare that your component depends on esp_driver_i2s, add the following to your CMakeLists.txt:

    REQUIRES esp_driver_i2s
    

    or

    PRIV_REQUIRES esp_driver_i2s
    

Functions

esp_err_t lp_i2s_vad_new_unit(lp_vad_t vad_id, const lp_vad_init_config_t *init_config, vad_unit_handle_t *ret_unit)

New LP VAD unit.

参数
  • vad_id -- [in] VAD id

  • init_config -- [in] Initial configurations

  • ret_unit -- [out] Unit handle

返回

  • ESP_OK: On success

  • ESP_ERR_INVALID_ARG: Invalid argument

  • ESP_ERR_INVALID_STATE: Driver state is invalid, you shouldn't call this API at this moment

esp_err_t lp_i2s_vad_enable(vad_unit_handle_t unit)

Enable LP VAD.

参数

unit -- [in] VAD handle

返回

  • ESP_OK: On success

  • ESP_ERR_INVALID_ARG: Invalid argument

  • ESP_ERR_INVALID_STATE: Driver state is invalid, you shouldn't call this API at this moment

esp_err_t lp_i2s_vad_disable(vad_unit_handle_t unit)

Disable LP VAD.

参数

unit -- [in] VAD handle

返回

  • ESP_OK: On success

  • ESP_ERR_INVALID_ARG: Invalid argument

  • ESP_ERR_INVALID_STATE: Driver state is invalid, you shouldn't call this API at this moment

esp_err_t lp_i2s_vad_del_unit(vad_unit_handle_t unit)

Delete LP VAD unit.

参数

unit -- [in] VAD handle

返回

  • ESP_OK: On success

  • ESP_ERR_INVALID_ARG: Invalid argument

  • ESP_ERR_INVALID_STATE: Driver state is invalid, you shouldn't call this API at this moment

Structures

struct lp_vad_config_t

LP VAD configurations.

Public Members

int init_frame_num

Number of init frames that are used for VAD to denoise, this helps the VAD to decrease the accidental trigger ratio. Note too big values may lead to voice activity miss

int min_energy_thresh

Minimum energy threshold, voice activities with energy higher than this value will be detected.

bool skip_band_energy_thresh

Skip band energy threshold or not, the passband energy check determines whether the proportion of passband energy within the total frequency domain meets the required threshold. Note in different environments, enabling the passband energy check may reduce false trigger rates but could also increase the rate of missed detections.

int speak_activity_thresh

When in speak-activity-listening-state, if number of the detected speak activity is higher than this value, VAD runs into speak-activity-detected-state

int non_speak_activity_thresh

When in speak-activity-detected-state, if the number of the detected speak activity is higher than this value, but lower than max_speak_activity_thresh:

  • if the number of the detected non-speak activity is higher than this value, VAD runs into speak-activity-listening-state

  • if the number of the detected non-speak activity is lower than this value, VAD keeps in speak-activity-detected-state

int min_speak_activity_thresh

When in speak-activity-detected-state, if the number of the detected speak activity is higher than this value, but lower than max_speak_activity_thresh, then the VAD state machine will depends on the value of non_speak_activity_thresh

int max_speak_activity_thresh

When in speak-activity-detected-state, if the number of the detected speak activity is higher than this value, VAD runs into speak-activity-listening-state

struct lp_vad_init_config_t

LP VAD Init Configurations.

Public Members

lp_i2s_chan_handle_t lp_i2s_chan

LP I2S channel handle.

lp_vad_config_t vad_config

LP VAD config.

Type Definitions

typedef uint32_t lp_vad_t

State Machine ┌──────────────────────────────────┐ │ │ ┌─────────────┤ speak-activity-listening-state │ ◄───────────────┐ │ │ │ │ │ └──────────────────────────────────┘ │ │ ▲ │ │ │ │ │ │ │ │ │ │ │ │ │ detected speak activity │ │ detected speak activity │ detected speak activity >= │ │ >= │ >= 'speak_activity_thresh' │ │ 'min_speak_activity_thresh' │ 'max_speak_activity_thresh' │ │ │ │ │ && │ │ │ │ │ │ detected non-speak activity │ │ │ < │ │ │ 'non_speak_activity_thresh' │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌───────────┴─────────────────────┐ │ │ │ │ │ └───────────► │ speak-activity-detected-state ├─────────────────┘ │ │ └─┬───────────────────────────────┘ │ │ ▲ │ │ │ │ │ │ detected speak activity │ │ >= │ │ 'min_speak_activity_thresh' │ │ │ │ && │ │ │ │ detected non-speak activity │ │ < └─────────────────────┘ 'non_speak_activity_thresh'.

LP VAD peripheral

typedef struct vad_unit_ctx_t *vad_unit_handle_t

Type of VAD unit handle.

Header File

  • components/ulp/lp_core/shared/include/ulp_lp_core_lp_vad_shared.h

  • This header file can be included with:

    #include "ulp_lp_core_lp_vad_shared.h"
    
  • This header file is a part of the API provided by the ulp component. To declare that your component depends on ulp, add the following to your CMakeLists.txt:

    REQUIRES ulp
    

    or

    PRIV_REQUIRES ulp
    

Functions

esp_err_t lp_core_lp_vad_init(lp_vad_t vad_id, const lp_core_lp_vad_cfg_t *init_config)

State Machine ┌──────────────────────────────────┐ │ │ ┌─────────────┤ speak-activity-listening-state │ ◄───────────────┐ │ │ │ │ │ └──────────────────────────────────┘ │ │ ▲ │ │ │ │ │ │ │ │ │ │ │ │ │ detected speak activity │ │ detected speak activity │ detected speak activity >= │ │ >= │ >= 'speak_activity_thresh' │ │ 'min_speak_activity_thresh' │ 'max_speak_activity_thresh' │ │ │ │ │ && │ │ │ │ │ │ detected non-speak activity │ │ │ < │ │ │ 'non_speak_activity_thresh' │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌───────────┴─────────────────────┐ │ │ │ │ │ └───────────► │ speak-activity-detected-state ├─────────────────┘ │ │ └─┬───────────────────────────────┘ │ │ ▲ │ │ │ │ │ │ detected speak activity │ │ >= │ │ 'min_speak_activity_thresh' │ │ │ │ && │ │ │ │ detected non-speak activity │ │ < └─────────────────────┘ 'non_speak_activity_thresh'.

LP VAD init

参数
  • vad_id -- [in] VAD ID

  • init_config -- [in] Initial configurations

返回

  • ESP_OK: On success

  • ESP_ERR_INVALID_ARG: Invalid argument

  • ESP_ERR_INVALID_STATE: Driver state is invalid, you shouldn't call this API at this moment

esp_err_t lp_core_lp_vad_enable(lp_vad_t vad_id)

Enable LP VAD.

参数

vad_id -- [in] VAD ID

返回

  • ESP_OK: On success

  • ESP_ERR_INVALID_ARG: Invalid argument

  • ESP_ERR_INVALID_STATE: Driver state is invalid, you shouldn't call this API at this moment

esp_err_t lp_core_lp_vad_disable(lp_vad_t vad_id)

Disable LP VAD.

参数

vad_id -- [in] VAD ID

返回

  • ESP_OK: On success

  • ESP_ERR_INVALID_ARG: Invalid argument

  • ESP_ERR_INVALID_STATE: Driver state is invalid, you shouldn't call this API at this moment

esp_err_t lp_core_lp_vad_deinit(lp_vad_t vad_id)

Deinit LP VAD.

参数

vad_id -- [in] VAD ID

返回

  • ESP_OK: On success

  • ESP_ERR_INVALID_ARG: Invalid argument

  • ESP_ERR_INVALID_STATE: Driver state is invalid, you shouldn't call this API at this moment

Type Definitions

typedef lp_vad_init_config_t lp_core_lp_vad_cfg_t

LP VAD configurations.


此文档对您有帮助吗?