Overview: Common Panic & Exception Introduction

[中文]

Before debugging the code, it is necessary to understand the common Panic & Exception, which include the following situations:

Watchdog Interrupt

This part often involves two situations: interrupt watchdog and task watchdog. For detailed documentation, please refer to Watchdog. The following are brief descriptions of these two watchdogs:

Interrupt Watchdog

The purpose of the interrupt watchdog timer is to ensure that the interrupt service routine (ISR) is not blocked for a long time (i.e., IWDT timeout). Blocking the timely execution of the ISR will increase the ISR delay and also prevent task switching (because task switching is executed from the ISR). Matters that prevent the ISR from running include:

  • Disabling interrupts

  • Critical section (also disables interrupts)

  • Other ISRs of the same or higher priority will prevent ISRs of the same or lower priority from completing

When the IWDT times out, the default operation is to call the panic handler and display the error cause (Interrupt wdt timeout on CPU0 or Interrupt wdt timeout on CPU1, depending on the situation).

At this time, the above three points need to be investigated, especially whether too much code is placed in the ISR, causing the ISR time to be too long and triggering the interrupt watchdog.

Task Watchdog

The task watchdog timer is used to monitor whether the tasks in FreeRTOS are scheduled within a specified time. If the default lowest priority IDLE task does not feed the dog (i.e., reset the watchdog timer) within the specified time, it will trigger the watchdog interrupt. This usually indicates that a high-priority task is always occupying the CPU due to reasons such as calling an API in an infinite loop without delay.

The debugging method for this problem is briefly illustrated as follows, deliberately writing an infinite loop without delay in app_main to print the log:

void app_main(void)
{
    while (1) {
        printf("Hello world!----------------------------------------------------\n");
    }
}

After the ESP is powered on and runs for CONFIG_ESP_TASK_WDT_TIMEOUT_S seconds, you can see the following log:

E (10303) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (10303) task_wdt:  - IDLE (CPU 0)
E (10303) task_wdt: Tasks currently running:
E (10303) task_wdt: CPU 0: main
E (10303) task_wdt: CPU 1: IDLE
E (10303) task_wdt: Print CPU 0 (current core) backtrace

The above log can be briefly analyzed as:

  • E (10303) task_wdt: - IDLE (CPU 0) –> IDLE0 on CPU0 did not feed the dog in time

  • E (10303) task_wdt: CPU 0: main –> The task currently running on CPU0 is the main task (the main task corresponds to the app_main function)

  • E (10303) task_wdt: CPU 1: IDLE –> The task currently running on CPU1 is the IDLE1 task

At this time, it is indicated that the IDLE0 task triggered the watchdog reset because it did not feed the dog in time within CONFIG_ESP_TASK_WDT_TIMEOUT_S seconds, and then it can be inferred that the main task should always occupy the CPU to cause the task watchdog reset through the task running on CPU0. This task should be the focus of investigation, and appropriate delays should be added to give other low-priority tasks the opportunity to be scheduled.

Brownout Interrupt

This part is a brownout interrupt. The ESP chip integrates a brownout detection circuit internally and is enabled by default. When the brownout detector is triggered, the following information will be printed:

Brownout detector was triggered

The chip will reset after the print information ends. At this time, you should check whether the hardware power supply voltage meets the set threshold. For more information, please refer to the Brownout document.

Note

In the scenario of battery power supply, such as 2xAA battery power supply, the voltage is 3.1 V. At this time, the voltage will drop for a short time due to the large instantaneous current in scenarios such as Wi-Fi connection, triggering the brownout detection and causing the chip to restart.

It is recommended to use a voltage regulator chip with a larger peak current. Or replace with a battery that can provide a large current, or try to increase the capacitance of the power supply.

Assert / Abort

This part is triggered by assertion or abortion. Assertions are commonly used to check whether assumptions in the program are true. If an assertion fails (i.e., the assumption is not established), it will trigger an interrupt. After the interrupt, the program usually aborts (abort) and records error information in the log to help developers debug.

Therefore, you can check which API triggered the assertion or abortion in the log at this time, and debug the code from this API, for example:

// Function to check if input is equal to 1
int check_input(int input) {
    if (input == 1) {
        return 1; // Return 1 if input is equal to 1
    } else {
        return 0; // Return 0 if input is not equal to 1
    }
}

void app_main(void)
{
    // Test case for input not equal to 1
    int input1 = 2;
    assert(check_input(input1) == 1); // Assert that the function returns 1 for input 1
}

The corresponding exception log is:

assert failed: app_main hello_world_main.c:26 (check_input(input1) == 1)

It can be seen that the reason is that assert(check_input(input1) == 1) triggered the assertion because the condition was not satisfied. At this time, you can focus on investigating the code corresponding to this assertion.

Note

Since assertions can be disabled in the CONFIG_COMPILER_OPTIMIZATION_ASSERTION_LEVEL option in ESP-IDF menuconfig, it is not recommended to do calculations or function calls in assertions.

Stack Overflow

This part is stack overflow. When the task stack actually used exceeds the pre-allocated task stack, this error will occur. The error example code is as follows:

static void test_task(void *pvParameters)
{
    uint32_t large_array[4096] = {0};
    while (1) {
        // This task obstruct a setting tx_done_sem semaphore in the UART interrupt.
        // It leads to waiting the ticks_to_wait time in uart_wait_tx_done() function.
        vTaskDelay(200 / portTICK_PERIOD_MS);
    }
    vTaskDelete(NULL);
}


void app_main(void)
{
    xTaskCreate(test_task, "test_task", 1024, NULL,  5, NULL);
}

The corresponding error log is as follows:

***ERROR*** A stack overflow in task test_task has been detected.

From the log, you can see that a stack overflow occurred in the test_task task. Checking the code, you can see that only 1024 bytes of task stack were allocated when xTaskCreate, but a 4096-byte array is used in the task, which caused the stack overflow. At this time, you need to reduce the array size in the task function, or increase the task stack allocated by xTaskCreate.

Cache Access Error

For this part, you can first refer to the Cache disabled but cached memory region accessed document. More detailed explanations of this error will be given in Locating Problems Using Guru Meditation Error Printing.

Invalid Memory / Instruction Address

When the program tries to access a non-existent memory address or execute an invalid instruction, this type of exception will be triggered. This is usually caused by pointer errors or memory corruption. When this type of error occurs, you can refer to Locating Problems Using Backtrace & Coredump and Locating Memory Issues (Memory Stompings, Memory Leaks, etc.) chapters for further code debugging.