Locating Memory Issues (Memory Stompings, Memory Leaks, etc.)
Common memory issues include the following:
Heap memory leak
Heap memory stomping
Stack overflow
Stack memory stomping
Pointer used after being freed
Pointer used before being initialized
Double free
This section will explain the general debugging methods for the above memory issues one by one.
Heap Memory Leak
Heap memory leaks often manifest as a program allocating a block of heap memory, but not correctly freeing it after use, leading to a memory leak. This can cause the memory consumed by the program to gradually increase during runtime, and may eventually exhaust the system’s available memory.
This part can refer to the Heap Memory Debugging Document, the key points are summarized as follows:
You can use heap_caps_get_per_task_info to get the memory allocation status of all tasks
You can use heap_caps_get_free_size to compare the remaining memory situation and roughly determine the leak area
Enable CONFIG_HEAP_TRACING_STANDALONE or CONFIG_HEAP_TRACING_TOHOST
STANDALONE mode requires buffer allocation, directly record, calculate, and print results on ESP, but RISC-V architecture cannot locate code lines
TOHOST requires UART/JTAG to use app_trace capture, analyze on the host, no extra buffer is needed, code lines can be located
heap_trace_init_standalone initializes the buffer, heap_trace_start(HEAP_TRACE_LEAKS) starts recording
heap_trace_stop() stops recording, use heap_trace_dump() to print analysis results
The typical log after using the above heap memory debugging method is as follows:
Xtensa
====== Heap Trace: 2 records (100 capacity) ======
36 bytes (@ 0x3fc9c524, Internal) allocated CPU 0 ccount 0x02f204e0 caller 0x42008cfd:0x42008d73
0x42008cfd: zoo_create at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:68
0x42008d73: mem_leak_task at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:96 (discriminator 3)
24 bytes (@ 0x3fc9c54c, Internal) allocated CPU 0 ccount 0x02f20c00 caller 0x42008cfd:0x42008d73
0x42008cfd: zoo_create at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:68
0x42008d73: mem_leak_task at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:96 (discriminator 3)
====== Heap Trace Summary ======
Mode: Heap Trace Leaks
60 bytes 'leaked' in trace (2 allocations)
records: 2 (100 capacity, 3 high water mark)
total allocations: 3
total frees: 1
================================
RISC-V
====== Heap Trace: 3 records (100 capacity) ======
36 bytes (@ 0x3fc91574, Internal) allocated CPU 0 ccount 0x02cf6d38 caller
36 bytes (@ 0x3fc9159c, Internal) allocated CPU 0 ccount 0x02cf72cc caller
====== Heap Trace Summary ======
Mode: Heap Trace Leaks
72 bytes 'leaked' in trace (2 allocations)
records: 2 (100 capacity, 3 high water mark)
total allocations: 3
total frees: 1
================================
Heap Memory Stomping
Heap memory stompings often occur when writing to or reading from heap memory, the program accesses an area beyond the memory range allocated to it. This can lead to undefined behavior and corrupt the program’s memory structure. The corresponding log for this error is often:
assert failed: remove_free_block tlsf.c:331 (next && "next_free field can not be null")
This part can refer to the Heap Memory Debugging Document, the key points are summarized as follows:
Locate Who and Where
Enable memory debugging, raise the heap memory debugging level CONFIG_HEAP_CORRUPTION_DETECTION to light impact or comprehensive:
Basic (default): Use heap properties to detect if it has been contaminated
Light impact : Add special bytes 0xABBA1234 0xBAAD5678 before and after the allocated memory
Comprehensive : Add uninitialized-access and use-after-free bugs check on the basis of light impact. When memory is allocated, all memory is initialized to 0xce, and all spaces are assigned to 0xfe after memory is released
After enabling memory debugging, wait for crash or actively call to check memory integrity heap_caps_check_integrity_all to trigger crash before and after the suspected memory stomping position. If the stomping address has been located, you can directly use heap_caps_check_integrity_addr.
Step on the tail, the current memory block operation is out of bounds CORRUPT HEAP: Bad tail at 0x3fc9ad5a. Expected 0xbaad5678 got 0x02020202
Step on the head, the previous memory block is out of bounds CORRUPT HEAP: Bad head at 0x3fc9a94c. Expected 0xabba1234 got 0x00000000
Two methods can confirm the neighbors before and after the memory block
Use heap trace, call heap_trace_start(HEAP_TRACE_ALL) to collect information
Use heap_caps_dump_all to print the collected information (need to print before and after memory allocation)
Note
The specific method of confirming the memory block status described above can refer to Heap Memory Tracking.
Locate When
You can set the CPU breakpoint in the code through esp_cpu_set_watchpoint(0, (void *)0x3fc9a94c, 4, ESP_CPU_WATCHPOINT_STORE);. If you don’t know which kernel, you need to call both kernels
The CPU will trigger a breakpoint when data is written to this address, and the code line can be located through PC, refer to the log as follows:
Guru Meditation Error: Core 0 panic'ed (Unhandled debug exception). Debug exception reason: Watchpoint 0 triggered Core 0 register dump: PC : 0x400570e8 PS : 0x00060c36 A0 : 0x82008d43 A1 : 0x3fc99f10 0x400570e8: memset in ROM A2 : 0x3fc9b3ac A3 : 0x00000000 A4 : 0x000003e8 A5 : 0x3fc9b75c A6 : 0x00000000 A7 : 0x0000003e A8 : 0x8200333d A9 : 0x3fc99ee0 A10 : 0x00000400 A11 : 0x00060c20 A12 : 0x00000000 A13 : 0x00060c23 A14 : 0xb33fffff A15 : 0xb33fffff SAR : 0x00000004 EXCCAUSE: 0x00000001 EXCVADDR: 0x00000000 LBEG : 0x400570e8 LEND : 0x400570f3 LCOUNT : 0x00000002 Backtrace: 0x400570e5:0x3fc99f10 0x42008d40:0x3fc99f20 0x4201874b:0x3fc99f50 0x4037a80d:0x3fc99f80 0x400570e5: memset in ROM 0x42008d40: app_main at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:169 (discriminator 3) 0x4201874b: main_task at /home/libo/esp/github_master/components/freertos/app_startup.c:208 (discriminator 13) 0x4037a80d: vPortTaskWrapper at /home/libo/esp/github_master/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162
Stack Overflow
The manifestation of stack overflow often occurs when using stack memory during function calls. If recursion or too many local variables are involved, the stack size may exceed the system’s allowed limit, resulting in a stack overflow. The ESP-IDF provides the following stack overflow detection mechanisms:
ESP-IDF FreeRTOS enables stack overflow detection by default. If a stack overflow is detected, it triggers an assertion, printing the corresponding stack overflow information. A typical log looks like this:
***ERROR*** A stack overflow in task test_task has been detected.
For more details, refer to the Stack Overflow section.
ESP-IDF supports enabling the End of Stack Watchpoint, which triggers a breakpoint before FreeRTOS stack overflow assertion.
The RISC-V platform supports enabling hardware stack overflow detection (Stack protection fault). For details, see Hardware Stack Protection.
Stack Memory Stomping
The manifestation of stack memory stomping is similar to heap memory stomping but occurs when the program uses stack memory. Writing or reading data beyond the allocated stack memory range may lead to program errors. Here are some key points to note:
It may lead to task stack overflow, generally detectable through FreeRTOS stack overflow mechanisms.
It may result in the overwriting of local variable values, causing unexpected program behavior.
It may cause the modification of local pointer variables, accessing illegal instructions/data addresses, leading to program crashes.
It may result in the overwriting of the function return address, causing the program to jump to an incorrect address and crash.
A simple example of error code is as follows:
int vulnerableFunction() {
int localArray[5]; // Array allocated on the stack
// Writing beyond the bounds of the array
for (int i = 0; i <= 5; ++i) {
localArray[i] = i;
}
return localArray[0];
}
void app_main() {
printf("Before vulnerable function.\n");
vulnerableFunction(); // Call the function that causes stack memory corruption
printf("After vulnerable function.\n");
}
It’s worth mentioning that ESP-IDF performs partial compile-time checks for such errors and issues warnings (though the compilation still passes). The compile-time warning log looks like this:
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c: In function 'vulnerableFunction':
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:20:23: warning: iteration 5 invokes undefined behavior [-Waggressive-loop-optimizations]
20 | localArray[i] = i;
| ~~~~~~~~~~~~~~^~~
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:19:23: note: within this loop
19 | for (int i = 0; i <= 5; ++i) {
Pointers Used After Release
The manifestation of using pointers after their release often occurs when a program releases a block of memory but continues to use a pointer that points to that memory. This may result in accessing invalid memory, leading to crashes or undefined behavior.
This issue can cause various errors that are challenging to trace through actual errors. Therefore, it is crucial to pay special attention to pointer usage during the development process. A simple example of erroneous code is as follows:
void app_main(void)
{
int *number = (int *)malloc(sizeof(int)); // Allocate memory for an integer
if (number == NULL) {
// Handle memory allocation failure
printf("Memory allocation failed.\n");
}
*number = 42; // Assign a value to the allocated memory
printf("Value before freeing: %d\n", *number);
free(number); // Free the allocated memory
// Attempt to use the pointer after freeing
// This will result in undefined behavior
printf("Value after freeing: %d\n", *number);
}
It’s worth noting that ESP-IDF can detect some of these errors during compilation and issue warnings (though the compilation may still succeed). Compilation-time warning logs may look like this:
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c: In function 'app_main':
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:32:5: warning: pointer 'number' used after 'free' [-Wuse-after-free]
32 | printf("Value after freeing: %d\n", *number);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:28:5: note: call to 'free' here
28 | free(number); // Free the allocated memory
Pointers Used Before Initialization
The manifestation of using pointers before their initialization often occurs when a program attempts to use an uninitialized pointer, resulting in access to unknown memory regions and causing unstable behavior.
This issue can cause various errors that are challenging to trace through actual errors. Therefore, it is crucial to pay special attention to pointer initialization during the development process. A simple example of erroneous code is as follows:
void app_main(void)
{
int *number; // Pointer declared but not initialized
// Attempt to dereference the uninitialized pointer
// This will result in undefined behavior
printf("Value: %d\n", *number);
}
It’s worth noting that ESP-IDF often detects some of these errors during compilation and issues error messages. Compilation-time error logs may look like this:
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c: In function 'app_main':
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:21:5: error: 'number' is used uninitialized [-Werror=uninitialized]
21 | printf("Value: %d\n", *number);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:17:10: note: 'number' was declared here
17 | int *number; // Pointer declared but not initialized
| ^~~~~~
cc1: some warnings being treated as errors
Double Free
The manifestation of double-free often occurs when a program releases memory that has already been freed. This can lead to memory pool corruption, resulting in program crashes or other severe issues. An example of erroneous code is as follows:
void app_main(void)
{
// Allocate a block of memory
int *data = (int *)malloc(sizeof(int));
// Check if memory allocation is successful
if (data != NULL) {
// Assign a value to the allocated memory
*data = 42;
// First free
free(data); // Line 26
// Second free (double-free)
free(data); // Line 29, this is incorrect and may lead to undefined behavior
}
}
It’s worth noting that ESP-IDF often detects some of these errors during compilation and issues warning messages. Compilation-time warning logs may look like this:
/home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c: In function 'app_main':
/home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:29:9: warning: pointer 'data' used after 'free' [-Wuse-after-free]
29 | free(data); // This is incorrect and may lead to undefined behavior
| ^~~~~~~~~~
/home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:26:9: note: call to 'free' here
26 | free(data);
| ^~~~~~~~~~
Runtime error logs may look like this:
I (285) main_task: Calling app_main()
assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && "block already marked as free")
Core 0 register dump:
Stack dump detected
MEPC : 0x403805d8 RA : 0x403838e8 SP : 0x3fc8f330 GP : 0x3fc8ae00
0x403805d8: panic_abort at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/panic.c:452
0x403838e8: __ubsan_include at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/ubsan.c:313
TP : 0x3fc87110 T0 : 0x37363534 T1 : 0x7271706f T2 : 0x33323130
S0/FP : 0x00000069 S1 : 0x00000001 A0 : 0x3fc8f36c A1 : 0x3fc8acd1
A2 : 0x00000001 A3 : 0x00000029 A4 : 0x00000001 A5 : 0x3fc8c000
A6 : 0x7a797877 A7 : 0x76757473 S2 : 0x00000009 S3 : 0x3fc8f49e
S4 : 0x3fc8acd0 S5 : 0x00000000 S6 : 0x00000000 S7 : 0x00000000
S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000
T3 : 0x6e6d6c6b T4 : 0x6a696867 T5 : 0x66656463 T6 : 0x62613938
MSTATUS : 0x00001881 MTVEC : 0x40380001 MCAUSE : 0x00000007 MTVAL : 0x00000000
0x40380001: _vector_table at ??:?
MHARTID : 0x00000000
Backtrace:
panic_abort (details=details@entry=0x3fc8f36c "assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && \"block already marked as free\")") at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/panic.c:452
452 *((volatile int *) 0) = 0; // NOLINT(clang-analyzer-core.NullDereference) should be an invalid operation on targets
#0 panic_abort (details=details@entry=0x3fc8f36c "assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && \"block already marked as free\")") at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/panic.c:452
#1 0x403838e8 in esp_system_abort (details=details@entry=0x3fc8f36c "assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && \"block already marked as free\")") at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/port/esp_system_chip.c:84
#2 0x403890e8 in __assert_func (file=file@entry=0x3c0212f3 "", line=line@entry=1119, func=<optimized out>, func@entry=0x3c021984 <__func__.6> "", expr=expr@entry=0x3c0217ec "") at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/newlib/assert.c:81
#3 0x40387e5e in tlsf_free (tlsf=0x3fc8c574, ptr=ptr@entry=0x3fc8ff20) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/heap/tlsf/tlsf.c:1119
#4 0x40387a8e in multi_heap_free_impl (heap=0x3fc8c560, p=p@entry=0x3fc8ff20) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/heap/multi_heap.c:231
#5 0x40380b98 in heap_caps_free (ptr=ptr@entry=0x3fc8ff20) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/heap/heap_caps.c:388
#6 0x4038910e in free (ptr=ptr@entry=0x3fc8ff20) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/newlib/heap.c:39
#7 0x4200712a in app_main () at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:29
#8 0x4201498a in main_task (args=<error reading variable: value has been optimized out>) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/freertos/app_startup.c:208
#9 0x40385a2c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:202
ELF file SHA256: 1df25094bc6834da
You can see from the logs that there is a block already marked as free warning and an error code location hint at app_main () at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:29. It indicates that a double-free situation occurred at line 29, and the second call to free needs to be removed.