Locating Problems Using Guru Meditation Error Printing

[中文]

Note

This document is automatically translated using AI. Please excuse any detailed errors. The official English version is still in progress.

You can first refer to the Guru Meditation Errors in the ESP-IDF Programming Guide to understand common errors and their causes. This document provides further explanations of common errors and presents feasible analytical approaches.

Note

This analysis focuses on errors occurring in the RISC-V architecture, specifically the ESP32-C3. While errors in the Xtensa architecture (e.g., ESP32) may differ in naming, their underlying causes remain similar. Architecture-specific errors will be analyzed separately.

Guru Meditation errors are categorized into two types: ARCH exceptions and SOC exceptions.

Common ARCH exceptions:

  • Load access fault

  • Store Access Fault

  • Instruction Access Fault

  • IllegalInstruction

Common SOC exceptions:

  • Interrupt wdt timeout on CPU0/CPU1

  • Cache disable but cached memory region accessed

Often, identifying the root cause of Guru Meditation errors can be challenging, as these crashes often represent symptoms of underlying issues. The actual problems may be related to FreeRTOS, memory, flash, interrupt, etc., requiring specific analysis.

Load access fault(LoadProhibited)

Test Case

This exception occurs when a load instruction (such as lw) attempts to access invalid memory. The example code demonstrates a scenario that triggers this crash:

uint8_t *buff = NULL;
load_fault = buff[1];

Exception Phenomenon

  • RISC-V

Guru Meditation Error: Core  0 panic'ed (Load access fault). Exception was unhandled.

Core  0 register dump:
Stack dump detected
MEPC    : 0x4200696a  RA      : 0x4200696a  SP      : 0x3fc8f7c0  GP      : 0x3fc8b400
TP      : 0x3fc87d74  T0      : 0x4005890e  T1      : 0x3fc8f41c  T2      : 0x00000000
S0/FP   : 0x3c023000  S1      : 0x00000000  A0      : 0x0000000f  A1      : 0x3fc8f3f8
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3fc8c000  A5      : 0x00000000
A6      : 0x60023000  A7      : 0x0000000a  S2      : 0x00000000  S3      : 0x00000000
S4      : 0x00000000  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x00000000  T4      : 0x00000000  T5      : 0x00000000  T6      : 0x00000000
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000005  MTVAL   : 0x00000001
  • XTENSA

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x42007c2f  PS      : 0x00060a30  A0      : 0x82007c42  A1      : 0x3fc97d70
A2      : 0x42011934  A3      : 0x3c023f48  A4      : 0x3c023fc8  A5      : 0x3fc97d90
A6      : 0x3fc97d70  A7      : 0x0000000c  A8      : 0x00000000  A9      : 0x3fc97d20
A10     : 0x0000000f  A11     : 0x3fc97fac  A12     : 0x3fc97d70  A13     : 0x0000000c
A14     : 0x3fc941e4  A15     : 0x00000000  SAR     : 0x00000004  EXCCAUSE: 0x0000001c

Backtrace: 0x42007c2c:0x3fc97d70 0x42007c3f:0x3fc97d90 0x42019247:0x3fc97db0 0x40379e71:0x3fc97de0

Troubleshooting

If further analyze the assembly code (the steps can be referred to Appendix 1 - ESP Disassembly Corresponding Instructions), you can find it corresponds to the following instructions:

lw a1, 1(zero)               // RSIC-V
l8ui    a11, a8, 1           // XTENSA

When this type of exception is triggered, MTVAL/EXCVADDR will automatically update to data related to the exception. From the RISC-V exception error, you can see MTVAL : 0x00000001.

Usually, MTVAL being NULL(0x00000000) represents taking a value from a NULL address, and close to NULL represents trying to access a member of an array or structure from a NULL address.

  • For the RSIC-V architecture, the PC and RA registers contain the function pointer of the exception, and you can find the corresponding function from the elf.

  • For the Xtensa architecture, a backtrace will be printed during an exception, and the corresponding function can also be found from the elf.

Practical Case

The following figure is a typical ESP32-C3 Load access fault issue.

../../../_images/load_access_fault.png

From the above figure, the problem lies in lw a5, 20(s0), s0 is 0x0, which corresponds to a NULL pointer. And then analyze the previous assembly lw s0, 24(sp), the corresponding code segment is pxTimer = xMessage.u.xTimerParameters.pxTimer;, at this point the question can be changed to why pxTimer is NULL.

You can further locate the problem by adding the following debug log:

if (pxTimer == NULL) {
    ets_printf("xMessageID: %d\n", xMessage.xMessageID);
}

The problem was finally found to be that the timer was not successfully created. The pointer pointing to the timer is NULL. If subsequent timer operations are performed at this time, a Load access fault(LoadProhibited) error will occur due to the null pointer.

gtimer_null

Precautions

Not all Load access fault(LoadProhibited) problems can be simply located. As shown in the figure below:

load_fault_2

The PC address points to the ROM function rom_i2c_writeReg_mask function, and the Backtrace does not have enough valid information. In this case, a satble reproduction is required for troubleshooting. If the problem is difficult to be reproduced, it may be caused by hardware or wild pointer.

Store access fault(StoreProhibited)

Test Case

When an application tries to write to an invalid memory location, this type of CPU exception occurs. The test code is shown below:

typedef struct {
    uint8_t buf[10];
    uint8_t data;
} store_test_t;

void store_access_fault(store_test_t * store_test)
{
    store_test->data = 0x5F;
    printf("Data: %d\n", store_test->data);
}
void app_main(void)
{
    store_test_t * store_test = NULL;
    store_access_fault(store_test);
}

Exception Phenomenon

  • RISC-V

In the RISC-V architecture chip, you can see the following exception printout. Analyze the assembly instructions (the steps can be referred to Appendix 1 - ESP Disassembly Corresponding Instructions), when this exception occurs, the PC address points to sb/sw and other store-related instructions. At this time, the exception value 0x0000000a stored in MTVAL indicates that it wants to access the content in an array or structure starting from NULL address.

Guru Meditation Error: Core  0 panic'ed (Store access fault). Exception was unhandled.

Core  0 register dump:
Stack dump detected
MEPC    : 0x42006960  RA      : 0x42006984  SP      : 0x3fc8f7c0  GP      : 0x3fc8b400
TP      : 0x3fc87d7c  T0      : 0x4005890e  T1      : 0x20000000  T2      : 0x00000000
S0/FP   : 0x3c023000  S1      : 0x00000000  A0      : 0x00000000  A1      : 0x3fc8f3f8
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3fc8c000  A5      : 0x0000005f
A6      : 0x60023000  A7      : 0x0000000a  S2      : 0x00000000  S3      : 0x00000000
S4      : 0x00000000  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x00000000  T4      : 0x00000000  T5      : 0x00000000  T6      : 0x00000000
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000007  MTVAL   : 0x0000000a
  • XTENSA

In the XTENSA architecture chip, you can see the following exception printout. Analyze the assembly instructions (steps can be referred to Appendix 1 - ESP Disassembly Corresponding Instructions), when this exception occurs, the PC address points to s8i/s32i and other store-related instructions. At this time, the exception value 0x0000000a stored in EXCVADDR indicates that it wants to access the content in an array or structure starting from NULL address.

Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x42007c21  PS      : 0x00060630  A0      : 0x82007c38  A1      : 0x3fc97d70
A2      : 0x00000000  A3      : 0x3c023f48  A4      : 0x3c023fc8  A5      : 0x3fc97d90
A6      : 0x3fc97d70  A7      : 0x0000000c  A8      : 0x0000005f  A9      : 0x3fc97d00
A10     : 0x00000031  A11     : 0x3fc97fac  A12     : 0x3fc97d70  A13     : 0x0000000c
A14     : 0x3fc941e4  A15     : 0x00000000  SAR     : 0x00000004  EXCCAUSE: 0x0000001d
EXCVADDR: 0x0000000a  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xfffffffd

Backtrace: 0x42007c1e:0x3fc97d70 0x42007c35:0x3fc97d90 0x4201923f:0x3fc97db0 0x40379e71:0x3fc97de0

Location Method

The location method is similar to Load access fault(LoadProhibited). Analyze the assembly code (steps can be referred to Appendix 1 - ESP Disassembly Corresponding Instructions), you will see the following instruction:

sb      a5,10(a0)             // RSIC-V
s8i     a8, a2, 10            // XTENSA

When this type of exception being triggered, MTVAL/EXCVADDR will automatically update to data related to the exception, as can be seen from the RISC-V exception information MTVAL : 0x0000000a. Usually, MTVAL being NULL means that the CPU is trying to write data to a NULL address, and close to NULL means trying to write data to a member of an array or structure starting from a NULL address.

  • For the RSIC-V architecture, the PC and RA registers contain the function pointers of the exception, and the corresponding function can be found from the elf

  • For the Xtensa architecture, a backtrace will be printed during an exception, and the corresponding function can also be found from the elf

Actual Case

Please refer to Test Case.

Notes

Similar to the Load access fault(LoadProhibited) issue, there are wild pointers cases that cannot be analyzed, and further analysis is required based on the actual application code.

Instruction Access Fault(InstrFetchProhibited)

Test Case

When an application tries to read instructions from an invalid address, this type of CPU exception occurs, and the PC register often points to an invalid memory address. The test code is as follows:

typedef void (*fptr_t)(void);
volatile fptr_t fptr = (fptr_t) 0x4;
fptr();

The corresponding RISC-V assembly instruction (steps can refer to Appendix 1 - ESP Disassembly Corresponding Instructions) is as follows:

42006958:       1101                    addi    sp,sp,-32
4200695a:       ce06                    sw      ra,28(sp)
4200695c:       4791                    li      a5,4
4200695e:       c63e                    sw      a5,12(sp)
42006960:       47b2                    lw      a5,12(sp)
42006962:       9782                    jalr    a5
42006964:       40f2                    lw      ra,28(sp)
42006966:       6105                    addi    sp,sp,32
42006968:       8082                    ret

Exception Phenomenon

  • RISC-V

Guru Meditation Error: Core  0 panic'ed (Instruction access fault). Exception was unhandled.

Core  0 register dump:
Stack dump detected
MEPC    : 0x00000004  RA      : 0x42006964  SP      : 0x3fc8f7b0  GP      : 0x3fc8b400
TP      : 0x3fc87d8c  T0      : 0x4005890e  T1      : 0x20000000  T2      : 0x00000000
S0/FP   : 0x3c023000  S1      : 0x00000000  A0      : 0x00000031  A1      : 0x3fc8f3f8
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3fc8c000  A5      : 0x00000004
A6      : 0x60023000  A7      : 0x0000000a  S2      : 0x00000000  S3      : 0x00000000
S4      : 0x00000000  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x00000000  T4      : 0x00000000  T5      : 0x00000000  T6      : 0x00000000
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000001  MTVAL   : 0x00000004
  • Xtensa

Guru Meditation Error: Core  0 panic'ed (InstrFetchProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x00000004  PS      : 0x00060630  A0      : 0x800d539a  A1      : 0x3ffb4a30
A2      : 0x400de738  A3      : 0x3f40379c  A4      : 0x3f40381c  A5      : 0x3ffb4a60
A6      : 0x3ffb4a40  A7      : 0x0000000c  A8      : 0x800e4dbe  A9      : 0x3ffb49d0
A10     : 0x00000031  A11     : 0x3ffafc64  A12     : 0x3ffb4a40  A13     : 0x0000000c
A14     : 0x3ffb2770  A15     : 0x0000cdcd  SAR     : 0x00000004  EXCCAUSE: 0x00000014
EXCVADDR: 0x00000004  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffd

Backtrace: 0x00000001:0x3ffb4a30 0x400d5397:0x3ffb4a60 0x400e54f8:0x3ffb4a80 0x400858e9:0x3ffb4ab0

Troubleshooting Method

Usually, such problems are caused by wild pointers. In this case, the PC register has no value, and whether other registers are worth analyzing depends on the running time of the wild pointer. For this test code, the current function has triggered a CPU exception, so the RA register contains the pointer of the upper-layer function (Xtensa architecture using Backtrace), and the exception can be analyzed at this case. But most of the time, the RA register will be destroyed. You can follow the steps below to locate the problem:

  1. Stably reproduce the problem, ensure that the problem is the same at each time

  2. Find the pattern of the problem, such as what operations are performed, or which logs are printed before the exception starts

  3. Analyze in conjunction with the code based on the location of the program exception, add debugging logs and gradually reducing the amount of project code, and see if the problem can still be reproduced.

Actual Case

As shown in the figure below, the scene of the exception has been completely destroyed. There is no way to infer the cause of the problem based on the information. However, most registers are destroyed into a fixed value, which is also a pattern. In cases where all registers are destroyed, it is usually a problem caused by memory trampling, which can be analyzed using the heap memory debugging method.

../../../_images/InstrFetchProhibited.png

Illegal Instruction

Test Case

During the CPU instruction fetching phase, an illegal instruction (defined in the RISC-V spec as a 16-bit instruction all set to 0) will print this error. The test code is as follows:

uint32_t instr_dram_addr = 0x0;
intptr_t instr_addr = MAP_DRAM_TO_IRAM((intptr_t)&instr_dram_addr);
typedef void (*illegal_instr_t)(void);
illegal_instr_t illegal_instr = (uint32_t *)instr_addr; // Ensure that the memory data in instr_addr is 0x0
illegal_instr();

Note

The ESP32 in the Xtensa platform does not support the above test case. If other chips want to run this test case, they need to turn off the memory protection function first.

Exception Phenomenon

  • RISC-V

Guru Meditation Error: Core  0 panic'ed (Illegal instruction). Exception was unhandled.

Core  0 register dump:
Stack dump detected
MEPC    : 0x4005f002  RA      : 0x42006962  SP      : 0x3fc8f7c0  GP      : 0x3fc8b400
0x42006962: illegal_instr_fault at /home/jacques/useful_example/crash_sim/store_access_fault/main/hello_world_main.c:79

TP      : 0x3fc87d8c  T0      : 0x4005890e  T1      : 0x20000000  T2      : 0x00000000
S0/FP   : 0x3c023000  S1      : 0x00000000  A0      : 0x3fc8f840  A1      : 0x3fc8f3f8
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3fc8c000  A5      : 0x4005f000
A6      : 0x60023000  A7      : 0x0000000a  S2      : 0x00000000  S3      : 0x00000000
S4      : 0x00000000  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x00000000  T4      : 0x00000000  T5      : 0x00000000  T6      : 0x00000000
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000002  MTVAL   : 0x00000000
  • Xtensa

Guru Meditation Error: Core  0 panic'ed (IllegalInstruction). Exception was unhandled.
Memory dump at 0x403e298c: 00000000 00000000 00000000
Core  0 register dump:
PC      : 0x403e2990  PS      : 0x00060130  A0      : 0x820059fe  A1      : 0x3fcf2990
A2      : 0x4200c030  A3      : 0x3c0232fc  A4      : 0x3c02337c  A5      : 0x3fcf29c0
A6      : 0x3fcf29a0  A7      : 0x0000000c  A8      : 0x820059f4  A9      : 0x3fcf2930
A10     : 0x00000031  A11     : 0x3fcf2be4  A12     : 0x3fcf29a0  A13     : 0x0000000c
A14     : 0x3fc922c4  A15     : 0x00000000  SAR     : 0x00000004  EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xfffffffd

Backtrace: 0x403e298d:0x3fcf2990 |<-CORRUPTED

Troubleshooting

Although the ESP-IDF documentation contains Explanation of Illegal instruction, which mentions that there may be 3 types of situations that trigger this issue, in actual use, most of them are caused by the CPU fetching instructions from flash abnormally. Another part of them are caused by wild pointers fetching abnormal memory. From the PC address, it can be seen whether the addressing is abnormal from IRAM or flash.

Senarios for problems with fetching instructions from flash include: flash powering down when chip is sleeping, unstable CS pull-up, unstable flash power supply, and flash suspend function, etc.

Actual Case

The following diagram shows an abnormal senario of the ESP chip in a low-power scenario. From the log, there is a power save printout before the error, which suggests that the problem may be related to the chip’s sleep mode. Later, by disabling the CONFIG turn off the flash power during sleep (CONFIG_ESP_SLEEP_POWER_DOWN_FLASH) and enabling the CONFIG CS software pull-up during sleep (CONFIG_ESP_SLEEP_FLASH_LEAKAGE_WORKAROUND), the problem no longer occurs.

illegal_instr

Cache disable but cached memory region accessed (Cache error)

Abnormal Reason

During flash operations (read/write/erase), the cache is turned off for a short time. If an interrupt is triggered during this time (interrupts decorated with ESP_INTR_FLAG_IRAM can be triggered when the cache is off), such as if there are functions in the interrupt function that need to fetch instructions or data from the flash through the cache (includingmmap), an exception will occur because the cache is turned off.

Test Case

After turning off the cache, access and print the data in the flash. The test code is shown below:

static const uint32_t s_in_rodata[8] = { 0x12345678, 0xfedcba98 };
spi_flash_disable_interrupts_caches_and_other_cpu();
volatile uint32_t* src = (volatile uint32_t*) s_in_rodata;
uint32_t v1 = src[0];
uint32_t v2 = src[1];
ets_printf("%lx %lx\n", v1, v2);

Abnormal Phenomenon

  • RISC-V

Guru Meditation Error: Core  0 panic'ed (Cache error).
access to cache while dbus or cache is disabled

Stack dump detected
Core  0 register dump:
MEPC    : 0x40000040  RA      : 0x40381e82  SP      : 0x3fc8f7c0  GP      : 0x3fc8b400
0x40381e82: cache_access_test_func at /home/jacques/useful_example/crash_sim/store_access_fault/main/hello_world_main.c:38

TP      : 0x3fc87d5c  T0      : 0x4005890e  T1      : 0x20000000  T2      : 0x00000000
S0/FP   : 0x3c023000  S1      : 0x00000000  A0      : 0x3c025610  A1      : 0x00000000
A2      : 0x00000000  A3      : 0x00000200  A4      : 0x600c4000  A5      : 0x3c02561c
A6      : 0x60023000  A7      : 0x0000000a  S2      : 0x00000000  S3      : 0x00000000
S4      : 0x00000000  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x00000000  T4      : 0x00000000  T5      : 0x00000000  T6      : 0x00000000
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000019  MTVAL   : 0x6944806f
  • Xtensa

Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed).

Core  0 register dump:
PC      : 0x400830ed  PS      : 0x00060a34  A0      : 0x800d539a  A1      : 0x3ffb4a40
0x400830ed: cache_access_test_func at /home/jacques/useful_example/crash_sim/store_access_fault/main/hello_world_main.c:32

A2      : 0x400de738  A3      : 0x3f40379c  A4      : 0x3f40381c  A5      : 0x3ffb4a60
0x400de738: vprintf at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vprintf.c:30

A6      : 0x3ffb4a40  A7      : 0x0000000c  A8      : 0x3f4040c4  A9      : 0x3ffb4a10
A10     : 0x00000001  A11     : 0xbad00bad  A12     : 0x3ffb2608  A13     : 0x0000000c
A14     : 0x3ffb2770  A15     : 0x0000cdcd  SAR     : 0x00000004  EXCCAUSE: 0x00000007
EXCVADDR: 0x00000000  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffd

Backtrace: 0x400830ea:0x3ffb4a40 0x400d5397:0x3ffb4a60 0x400e54e4:0x3ffb4a80 0x40085915:0x3ffb4ab0

Analysis Method

In such cases, focus on the PC address of the exception. If the address is in flash, it means that instructions are read from flash when the cache is closed. You need to decorate the corresponding function with IRAM_ATTR. It should be noted that IRAM_ATTR is only for the current function. If this function has sub-functions, all the sub-functions need to be decorated with IRAM_ATTR.

If the address is in IRAM, it means that there is no problem with this function itself, but the input parameters of this function may need to be read from flash.

Practical Case

During flash read and write operations (causing cache close), the gptimer interrupt that has been put into IRAM is also triggered. The gptimer ISR executes some operations that need to read content from flash, triggering this problem:

cache_disable

From the figure above, we can conclude several problem points

  1. It can be confirmed that the gptimer interrupt is triggered during flash read and write operations, and there is content in the flash in the ISR

  2. The function gpio_set_level pointed by the PC has been put into IRAM instead of flash, so it should run normally when the cache is closed

After analysis, it was finally found that the problem was caused by the input parameter LED_matrix of the function gpio_set_level being decorated with const. The parameters decorated with const will be placed in the .rodata section of flash, so reading this parameter requires cache.

cache_disable2

Precautions

The exception of Cache disable is better captured on the Xtensa platform, and may appear as other errors caused by the inability to read data from the cache on the RISC-V platform, such as Illegal instruction. The code example is as follows:

gpio_set_level(12, 1);
spi_flash_disable_interrupts_caches_and_other_cpu();
gpio_set_level(12, 0);

When running in ESP32-C3 (RISC-V platform), the crash is as follows

Guru Meditation Error: Core  0 panic'ed (Illegal instruction). Exception was unhandled.

Core  0 register dump:
Stack dump detected
MEPC    : 0x42006ef2  RA      : 0x40381e7e  SP      : 0x3fc8f7f0  GP      : 0x3fc8b400
0x42006ef2: gpio_set_level at /home/jacques/sdk/esp-idf/components/driver/gpio/gpio.c:233
0x40381e7e: cache_access_test_func at /home/jacques/useful_example/crash_test/main/hello_world_main.c:50
TP      : 0x3fc87d4c  T0      : 0x4005890e  T1      : 0x20000000  T2      : 0x00000000
S0/FP   : 0x3c023000  S1      : 0x00000000  A0      : 0x0000000c  A1      : 0x00000000
A2      : 0x00000200  A3      : 0x00000200  A4      : 0x600c4000  A5      : 0x00000000
A6      : 0x60023000  A7      : 0x0000000a  S2      : 0x00000000  S3      : 0x00000000
S4      : 0x00000000  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x00000000  T4      : 0x00000000  T5      : 0x00000000  T6      : 0x00000000
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000002  MTVAL   : 0x00000000

Interrupt wdt timeout on CPU0/CPU1

Exception Principle

The interrupt watchdog uses the hardware watchdog timer on Timer group 1 to monitor system task scheduling by incorporating a dog-feeding operation in the SysTick interrupt. If the dog-feeding operation is not executed within the SysTick interrupt for an extended period (default 300 ms), the interrupt watchdog interrupt is triggered. For more information, please refer to Interrupt Watchdog Timer (IWDT).

Test Case

This issue can be triggered by disabling the interrupt. The test code is as follows:

static portMUX_TYPE s_init_spinlock = portMUX_INITIALIZER_UNLOCKED;
portENTER_CRITICAL(&s_init_spinlock);
while (1) {
}

Exception Phenomenon

  • RISC-V

Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0).

Core  0 register dump:
Stack dump detected
MEPC    : 0x42006964  RA      : 0x42006964  SP      : 0x3fc911a0  GP      : 0x3fc8b400
0x42006964: interrupt_wdt_fault_task at /home/jacques/useful_example/crash_test/main/hello_world_main.c:121 (discriminator 1)

TP      : 0x3fc8971c  T0      : 0x00000000  T1      : 0x00000000  T2      : 0x00000000
S0/FP   : 0x00000000  S1      : 0x00000000  A0      : 0x00000001  A1      : 0x00000000
A2      : 0x00000000  A3      : 0x00000004  A4      : 0x00000001  A5      : 0x3fc8c000
A6      : 0x00000000  A7      : 0x00000000  S2      : 0x00000000  S3      : 0x00000000
S4      : 0x00000000  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x00000000  T4      : 0x00000000  T5      : 0x00000000  T6      : 0x00000000
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000018  MTVAL   : 0x0000a001
  • Xtensa

Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0).

Core  0 register dump:
PC      : 0x40378b06  PS      : 0x00060634  A0      : 0x82002239  A1      : 0x3fc984f0
0x40378b06: esp_cpu_wait_for_intr at /home/jacques/sdk/esp-idf/components/esp_hw_support/cpu.c:110

A2      : 0x00000000  A3      : 0x00000000  A4      : 0x3fc96c70  A5      : 0x3fc96c50
A6      : 0x42005e10  A7      : 0x00000000  A8      : 0x82009452  A9      : 0x3fc984b0
0x42005e10: timer_task at /home/jacques/sdk/esp-idf/components/esp_timer/src/esp_timer.c:470

A10     : 0x00000000  A11     : 0x00000000  A12     : 0x3fc96c50  A13     : 0x3fc96c30
A14     : 0x00000000  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000


Backtrace: 0x40378b03:0x3fc984f0 0x42002236:0x3fc98510 0x4037b5a9:0x3fc98530 0x40379e71:0x3fc98550
0x40378b03: xt_utils_wait_for_intr at /home/jacques/sdk/esp-idf/components/xtensa/include/xt_utils.h:81
 (inlined by) esp_cpu_wait_for_intr at /home/jacques/sdk/esp-idf/components/esp_hw_support/cpu.c:101
0x42002236: esp_vApplicationIdleHook at /home/jacques/sdk/esp-idf/components/esp_system/freertos_hooks.c:59
0x4037b5a9: prvIdleTask at /home/jacques/sdk/esp-idf/components/freertos/FreeRTOS-Kernel/tasks.c:4269 (discriminator 1)
0x40379e71: vPortTaskWrapper at /home/jacques/sdk/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:149

Core  1 register dump:
PC      : 0x42007c32  PS      : 0x00060034  A0      : 0x80379e74  A1      : 0x3fc99f20
0x42007c32: interrupt_wdt_fault_task at /home/jacques/useful_example/crash_test/main/hello_world_main.c:121 (discriminator 1)

A2      : 0x00000000  A3      : 0x00000000  A4      : 0x00000001  A5      : 0x00000000
A6      : 0x00000001  A7      : 0x00000000  A8      : 0x82007c32  A9      : 0x3fc99ef0
A10     : 0x00000001  A11     : 0x3fc93a54  A12     : 0x00060020  A13     : 0x00060023
A14     : 0x00060323  A15     : 0x0000abab  SAR     : 0x00000000  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000


Backtrace: 0x42007c2f:0x3fc99f20 0x40379e71:0x3fc99f40
0x42007c2f: vPortEnterCritical at /home/jacques/sdk/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:573
 (inlined by) interrupt_wdt_fault_task at /home/jacques/useful_example/crash_test/main/hello_world_main.c:120
0x40379e71: vPortTaskWrapper at /home/jacques/sdk/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:149

Analysis Method

In case of such problems, primarily observe the PC address of the exception to determine the location of the crash trigger. For multi-core architectures, it is necessary to observe the registers of several cores and analyze the methods that trigger the problem.

The possible reasons and specific explanations are as follows:

  1. Long time interrupt off

  2. Blocking issue exists in ISR function

  3. Interrupt not cleared

Long time interrupt off

Turning off the interrupt is designed to protect critical code sections from being interrupted. However, long time interrupt off is a primary cause of triggering the interrupt watchdog.

As in Test Case, operations like entering a critical section, the interrupt is turned off, preventing the SysTick interrupt from executing the watchdog feeding operation in time. This leads to an interrupt watchdog timeout and triggers the watchdog exception handler. Users must troubleshoot the logic of interrupt disabling in the problematic code area.

Blocking in ISR function

When an interrupt occurs, the corresponding ISR is called for processing. ISR functions should perform minimal operations, deferring time-consuming tasks to non-interrupt functions. If an ISR takes too long to execute or blocks, it may trigger the interrupt watchdog.

Here is a simple example of GPIO ISR blocking:

#define GPIO_INPUT_IO_0     4

static void IRAM_ATTR gpio_isr_handler(void* arg)
{
    ets_printf("ISR\n");
    while(1){}
}

void app_main(void)
{
    gpio_config_t io_conf;
    io_conf.intr_type = GPIO_INTR_POSEDGE;
    io_conf.pin_bit_mask = (1ULL<<GPIO_INPUT_IO_0);
    io_conf.mode = GPIO_MODE_INPUT;
    io_conf.pull_up_en = 1;
    gpio_config(&io_conf);

    gpio_install_isr_service(0);
    gpio_isr_handler_add(GPIO_INPUT_IO_0, gpio_isr_handler, (void*) GPIO_INPUT_IO_0);
}

Manually pull up GPIO 4 to trigger the GPIO interrupt, and print the following crash information:

ISR
Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0).
Core  0 register dump:
PC      : 0x40082580  PS      : 0x00060034  A0      : 0x80081560  A1      : 0x3ffb07c0
0x40082580: gpio_isr_handler at /home/jacques/problem/wdt/int_wdt/build/../main/hello_world_main.c:40 (discriminator 1)

A2      : 0x00000004  A3      : 0x3ff0015c  A4      : 0x800d6d82  A5      : 0x3ffb3400
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x80082580  A9      : 0x3ffb0730
A10     : 0x00000004  A11     : 0x00000004  A12     : 0x00000004  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x3ffb3400  SAR     : 0x0000001c  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000
Core  0 was running in ISR context:
EPC1    : 0x400d1097  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x40082580
0x400d1097: uart_hal_write_txfifo at /home/jacques/sdk/esp-idf/components/hal/uart_hal_iram.c:35

0x40082580: gpio_isr_handler at /home/jacques/problem/wdt/int_wdt/build/../main/hello_world_main.c:40 (discriminator 1)

Backtrace:0x4008257d:0x3ffb07c0 0x4008155d:0x3ffb07e0 0x40082102:0x3ffb0800 0x400d6702:0x3ffb3450 0x400862a2:0x3ffb3470 0x40087865:0x3ffb3490

Through the PC and Backtrace, we can confirm that the problem occurs in the gpio_isr_handle function. In addition, the log line Core 0 was running in ISR context also indicates that it is currently running in the interrupt function.

Interrupt not cleared

After the ISR function ends, the corresponding interrupt must be manually cleared. If it is not cleared at this time, the interrupt will keep triggering, preventing the SysTick interrupt from responding.

As in the example Blocking in ISR function, if you remove the while(1) loop in gpio_isr_handler, and comment out gpio_hal_clear_intr_status_bit in the ESP-IDF GPIO driver, recompiling and running the program will still trigger the interrupt watchdog crash.

If the user has not modified the default driver, interrupt clearing is usually handled by the lower-layer driver. If an interrupt watchdog exception points to the internal driver, check whether the lower-layer driver properly clears the interrupt.

Real Case

Though it might seem that interrupt watchdog issues are straightforward to analyze, they often occur alongside FreeRTOS in actual applications. Many FreeRTOS operations require interrupt handling. If FreeRTOS encounters an exception, it can get stuck, appearing as an interrupt watchdog issue. Here’s an example of a real crash case:

Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC      : 0x400915d2  PS      : 0x00060b34  A0      : 0x800903ed  A1      : 0x3ffd4e80
A2      : 0x3ffd3598  A3      : 0x3ffd4ff0  A4      : 0x00000c9f  A5      : 0x3f408990
A6      : 0x3ffbf580  A7      : 0x00060023  A8      : 0x3ffd4ff0  A9      : 0x0000000f
A10     : 0x3ffd4ff0  A11     : 0x0000000f  A12     : 0x00060b20  A13     : 0x00000001
A14     : 0x3ffeddf0  A15     : 0x3ffea57c  SAR     : 0x00000018  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffe

ELF file SHA256: eadf38922e335ed1

Backtrace: 0x400915cf:0x3ffd4e80 0x400903ea:0x3ffd4ea0 0x4008f514:0x3ffd4ec0 0x400ffb91:0x3ffd4f00 0x40136612:0x3ffd4f30

Analyzing the Backtrace information, as follows:

~$ xtensa-esp32-elf-addr2line -afe esp32_interrupt_wdt.elf 0x400915cf:0x3ffd4e80 0x400903ea:0x3ffd4ea0 0x4008f514:0x3ffd4ec0 0x400ffb91:0x3ffd4f00 0x40136612:0x3ffd4f30
0x400915cf
vListInsert
/arm/esp-idf/components/freertos/list.c:205
0x400903ea
vTaskPlaceOnEventList
/arm/esp-idf/components/freertos/tasks.c:2901 (discriminator 2)
0x4008f514
xQueueGenericReceive
/arm/esp-idf/components/freertos/queue.c:1596
0x400ffb91
arch_os_mbox_fetch
/arm/components/esp32/arch_os.c:352 (discriminator 4)
0x40136612
msg_fetch_block
/arm/components/libs/d0/delayzero.c:631

From the above log, the problem is not a crash in the interrupt, but points to FreeRTOS itself. By searching for vListInsert, it shows that the problem occurs in the following for loop:

if( xValueOfInsertion == portMAX_DELAY )
{
    pxIterator = pxList->xListEnd.pxPrevious;
}
else
{
    for( pxIterator = ( ListItem_t * ) &( pxList->xListEnd ); pxIterator->pxNext->xItemValue <= xValueOfInsertion; pxIterator = pxIterator->pxNext )
    {
        /* There is nothing to do here, just iterating to the wanted
        insertion position. */
    }
}

Further check whether the corresponding upper-layer code has interrupt operations at this moment. Finally in vTaskPlaceOnEventList, it shows that interrupts are indeed disabled before performing the list insertion operation.

void vTaskPlaceOnEventList( List_t * const pxEventList, const TickType_t xTicksToWait )
{
    configASSERT( pxEventList );
    taskENTER_CRITICAL(&xTaskQueueMutex);

    vListInsert( pxEventList, &( pxCurrentTCB[xPortGetCoreID()]->xEventListItem ) );

    prvAddCurrentTaskToDelayedList( xPortGetCoreID(), xTicksToWait);
    taskEXIT_CRITICAL(&xTaskQueueMutex);
}

For now the analysis is complete. This issue is a FreeRTOS problem that manifests as an interrupt watchdog. To troubleshoot this type of issue, we need to analyze why it no longer exits in the for loop. It is unlikely that the issue lies with FreeRTOS itself, rather, it is typically caused by improper use of the upper-layer application code, such as memory stomping, blocking operations within interrupts, or similar issues.

Appendix 1 - ESP Disassembly Corresponding Instructions

  • Xtensa ESP32 : xtensa-esp32-elf-objdump -S xxx.elf > a.txt

  • Xtensa ESP32-S2 : xtensa-esp32s2-elf-objdump -S xxx.elf > a.txt

  • Xtensa ESP32-S3 : xtensa-esp32s3-elf-objdump -S xxx.elf > a.txt

  • RISC-V : riscv32-esp-elf-objdump -S xxx.elf > a.txt

Where xxx.elf is the project elf file, which is often in the build folder of the project directory. a.txt is the output of the disassembly, which contains the actual assembly instructions.