Locating Memory Issues (Memory Stompings, Memory Leaks, etc.)

[中文]

Common memory issues include the following:

  • Heap memory leak

  • Heap memory stomping

  • Stack overflow

  • Stack memory stomping

  • Pointer used after being freed

  • Pointer used before being initialized

  • Double free

This section will explain the general debugging methods for the above memory issues one by one.

Heap Memory Leak

Heap memory leaks often manifest as a program allocating a block of heap memory, but not correctly freeing it after use, leading to a memory leak. This can cause the memory consumed by the program to gradually increase during runtime, and may eventually exhaust the system’s available memory.

This part can refer to the Heap Memory Debugging Document, the key points are summarized as follows:

  • You can use heap_caps_get_per_task_info to get the memory allocation status of all tasks

  • You can use heap_caps_get_free_size to compare the remaining memory situation and roughly determine the leak area

  • Enable CONFIG_HEAP_TRACING_STANDALONE or CONFIG_HEAP_TRACING_TOHOST

  • STANDALONE mode requires buffer allocation, directly record, calculate, and print results on ESP, but RISC-V architecture cannot locate code lines

  • TOHOST requires UART/JTAG to use app_trace capture, analyze on the host, no extra buffer is needed, code lines can be located

  • heap_trace_init_standalone initializes the buffer, heap_trace_start(HEAP_TRACE_LEAKS) starts recording

  • heap_trace_stop() stops recording, use heap_trace_dump() to print analysis results

The typical log after using the above heap memory debugging method is as follows:

  1. Xtensa

====== Heap Trace: 2 records (100 capacity) ======
36 bytes (@ 0x3fc9c524, Internal) allocated CPU 0 ccount 0x02f204e0 caller 0x42008cfd:0x42008d73
0x42008cfd: zoo_create at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:68

0x42008d73: mem_leak_task at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:96 (discriminator 3)

    24 bytes (@ 0x3fc9c54c, Internal) allocated CPU 0 ccount 0x02f20c00 caller 0x42008cfd:0x42008d73
0x42008cfd: zoo_create at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:68

0x42008d73: mem_leak_task at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:96 (discriminator 3)

====== Heap Trace Summary ======
Mode: Heap Trace Leaks
60 bytes 'leaked' in trace (2 allocations)
records: 2 (100 capacity, 3 high water mark)
total allocations: 3
total frees: 1
================================
  1. RISC-V

====== Heap Trace: 3 records (100 capacity) ======
36 bytes (@ 0x3fc91574, Internal) allocated CPU 0 ccount 0x02cf6d38 caller
36 bytes (@ 0x3fc9159c, Internal) allocated CPU 0 ccount 0x02cf72cc caller
====== Heap Trace Summary ======
Mode: Heap Trace Leaks
72 bytes 'leaked' in trace (2 allocations)
records: 2 (100 capacity, 3 high water mark)
total allocations: 3
total frees: 1
================================

Heap Memory Stomping

Heap memory stompings often occur when writing to or reading from heap memory, the program accesses an area beyond the memory range allocated to it. This can lead to undefined behavior and corrupt the program’s memory structure. The corresponding log for this error is often:

assert failed: remove_free_block tlsf.c:331 (next && "next_free field can not be null")

This part can refer to the Heap Memory Debugging Document, the key points are summarized as follows:

  1. Locate Who and Where

  • Enable memory debugging, raise the heap memory debugging level CONFIG_HEAP_CORRUPTION_DETECTION to light impact or comprehensive:

    • Basic (default): Use heap properties to detect if it has been contaminated

    • Light impact : Add special bytes 0xABBA1234 0xBAAD5678 before and after the allocated memory

    • Comprehensive : Add uninitialized-access and use-after-free bugs check on the basis of light impact. When memory is allocated, all memory is initialized to 0xce, and all spaces are assigned to 0xfe after memory is released

  • After enabling memory debugging, wait for crash or actively call to check memory integrity heap_caps_check_integrity_all to trigger crash before and after the suspected memory stomping position. If the stomping address has been located, you can directly use heap_caps_check_integrity_addr.

    • Step on the tail, the current memory block operation is out of bounds CORRUPT HEAP: Bad tail at 0x3fc9ad5a. Expected 0xbaad5678 got 0x02020202

    • Step on the head, the previous memory block is out of bounds CORRUPT HEAP: Bad head at 0x3fc9a94c. Expected 0xabba1234 got 0x00000000

  • Two methods can confirm the neighbors before and after the memory block

    • Use heap trace, call heap_trace_start(HEAP_TRACE_ALL) to collect information

    • Use heap_caps_dump_all to print the collected information (need to print before and after memory allocation)

Note

The specific method of confirming the memory block status described above can refer to Heap Memory Tracking.

  1. Locate When

  • You can set the CPU breakpoint in the code through esp_cpu_set_watchpoint(0, (void *)0x3fc9a94c, 4, ESP_CPU_WATCHPOINT_STORE);. If you don’t know which kernel, you need to call both kernels

  • The CPU will trigger a breakpoint when data is written to this address, and the code line can be located through PC, refer to the log as follows:

Guru Meditation Error: Core  0 panic'ed (Unhandled debug exception).
Debug exception reason: Watchpoint 0 triggered
Core  0 register dump:
PC      : 0x400570e8  PS      : 0x00060c36  A0      : 0x82008d43  A1      : 0x3fc99f10
0x400570e8: memset in ROM

A2      : 0x3fc9b3ac  A3      : 0x00000000  A4      : 0x000003e8  A5      : 0x3fc9b75c
A6      : 0x00000000  A7      : 0x0000003e  A8      : 0x8200333d  A9      : 0x3fc99ee0
A10     : 0x00000400  A11     : 0x00060c20  A12     : 0x00000000  A13     : 0x00060c23
A14     : 0xb33fffff  A15     : 0xb33fffff  SAR     : 0x00000004  EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0x00000002

Backtrace: 0x400570e5:0x3fc99f10 0x42008d40:0x3fc99f20 0x4201874b:0x3fc99f50 0x4037a80d:0x3fc99f80
0x400570e5: memset in ROM
0x42008d40: app_main at /home/libo/test_github/idf_debug_method/main/idf_debug_method.c:169 (discriminator 3)
0x4201874b: main_task at /home/libo/esp/github_master/components/freertos/app_startup.c:208 (discriminator 13)
0x4037a80d: vPortTaskWrapper at /home/libo/esp/github_master/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

Stack Overflow

The manifestation of stack overflow often occurs when using stack memory during function calls. If recursion or too many local variables are involved, the stack size may exceed the system’s allowed limit, resulting in a stack overflow. The ESP-IDF provides the following stack overflow detection mechanisms:

  1. ESP-IDF FreeRTOS enables stack overflow detection by default. If a stack overflow is detected, it triggers an assertion, printing the corresponding stack overflow information. A typical log looks like this:

    ***ERROR*** A stack overflow in task test_task has been detected.
    

    For more details, refer to the Stack Overflow section.

  2. ESP-IDF supports enabling the End of Stack Watchpoint, which triggers a breakpoint before FreeRTOS stack overflow assertion.

  3. The RISC-V platform supports enabling hardware stack overflow detection (Stack protection fault). For details, see Hardware Stack Protection.

Stack Memory Stomping

The manifestation of stack memory stomping is similar to heap memory stomping but occurs when the program uses stack memory. Writing or reading data beyond the allocated stack memory range may lead to program errors. Here are some key points to note:

  1. It may lead to task stack overflow, generally detectable through FreeRTOS stack overflow mechanisms.

  2. It may result in the overwriting of local variable values, causing unexpected program behavior.

  3. It may cause the modification of local pointer variables, accessing illegal instructions/data addresses, leading to program crashes.

  4. It may result in the overwriting of the function return address, causing the program to jump to an incorrect address and crash.

A simple example of error code is as follows:

int vulnerableFunction() {
    int localArray[5];  // Array allocated on the stack

    // Writing beyond the bounds of the array
    for (int i = 0; i <= 5; ++i) {
        localArray[i] = i;
    }

    return localArray[0];
}

void app_main() {
    printf("Before vulnerable function.\n");

    vulnerableFunction();  // Call the function that causes stack memory corruption

    printf("After vulnerable function.\n");
}

It’s worth mentioning that ESP-IDF performs partial compile-time checks for such errors and issues warnings (though the compilation still passes). The compile-time warning log looks like this:

/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c: In function 'vulnerableFunction':
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:20:23: warning: iteration 5 invokes undefined behavior [-Waggressive-loop-optimizations]
  20 |         localArray[i] = i;
      |         ~~~~~~~~~~~~~~^~~
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:19:23: note: within this loop
  19 |     for (int i = 0; i <= 5; ++i) {

Pointers Used After Release

The manifestation of using pointers after their release often occurs when a program releases a block of memory but continues to use a pointer that points to that memory. This may result in accessing invalid memory, leading to crashes or undefined behavior.

This issue can cause various errors that are challenging to trace through actual errors. Therefore, it is crucial to pay special attention to pointer usage during the development process. A simple example of erroneous code is as follows:

void app_main(void)
{
    int *number = (int *)malloc(sizeof(int));  // Allocate memory for an integer

    if (number == NULL) {
        // Handle memory allocation failure
        printf("Memory allocation failed.\n");
    }

    *number = 42;  // Assign a value to the allocated memory

    printf("Value before freeing: %d\n", *number);

    free(number);  // Free the allocated memory

    // Attempt to use the pointer after freeing
    // This will result in undefined behavior
    printf("Value after freeing: %d\n", *number);
}

It’s worth noting that ESP-IDF can detect some of these errors during compilation and issue warnings (though the compilation may still succeed). Compilation-time warning logs may look like this:

/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c: In function 'app_main':
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:32:5: warning: pointer 'number' used after 'free' [-Wuse-after-free]
  32 |     printf("Value after freeing: %d\n", *number);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:28:5: note: call to 'free' here
  28 |     free(number);  // Free the allocated memory

Pointers Used Before Initialization

The manifestation of using pointers before their initialization often occurs when a program attempts to use an uninitialized pointer, resulting in access to unknown memory regions and causing unstable behavior.

This issue can cause various errors that are challenging to trace through actual errors. Therefore, it is crucial to pay special attention to pointer initialization during the development process. A simple example of erroneous code is as follows:

void app_main(void)
{
    int *number;  // Pointer declared but not initialized

    // Attempt to dereference the uninitialized pointer
    // This will result in undefined behavior
    printf("Value: %d\n", *number);

}

It’s worth noting that ESP-IDF often detects some of these errors during compilation and issues error messages. Compilation-time error logs may look like this:

/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c: In function 'app_main':
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:21:5: error: 'number' is used uninitialized [-Werror=uninitialized]
  21 |     printf("Value: %d\n", *number);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/user/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:17:10: note: 'number' was declared here
  17 |     int *number;  // Pointer declared but not initialized
      |          ^~~~~~
cc1: some warnings being treated as errors

Double Free

The manifestation of double-free often occurs when a program releases memory that has already been freed. This can lead to memory pool corruption, resulting in program crashes or other severe issues. An example of erroneous code is as follows:

void app_main(void)
{
    // Allocate a block of memory
    int *data = (int *)malloc(sizeof(int));

    // Check if memory allocation is successful
    if (data != NULL) {
        // Assign a value to the allocated memory
        *data = 42;

        // First free
        free(data); // Line 26

        // Second free (double-free)
        free(data);  // Line 29, this is incorrect and may lead to undefined behavior
    }
}

It’s worth noting that ESP-IDF often detects some of these errors during compilation and issues warning messages. Compilation-time warning logs may look like this:

/home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c: In function 'app_main':
/home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:29:9: warning: pointer 'data' used after 'free' [-Wuse-after-free]
  29 |         free(data);  // This is incorrect and may lead to undefined behavior
      |         ^~~~~~~~~~
/home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:26:9: note: call to 'free' here
  26 |         free(data);
    |         ^~~~~~~~~~

Runtime error logs may look like this:

I (285) main_task: Calling app_main()

assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && "block already marked as free")
Core  0 register dump:
Stack dump detected
MEPC    : 0x403805d8  RA      : 0x403838e8  SP      : 0x3fc8f330  GP      : 0x3fc8ae00
0x403805d8: panic_abort at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/panic.c:452

0x403838e8: __ubsan_include at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/ubsan.c:313

TP      : 0x3fc87110  T0      : 0x37363534  T1      : 0x7271706f  T2      : 0x33323130
S0/FP   : 0x00000069  S1      : 0x00000001  A0      : 0x3fc8f36c  A1      : 0x3fc8acd1
A2      : 0x00000001  A3      : 0x00000029  A4      : 0x00000001  A5      : 0x3fc8c000
A6      : 0x7a797877  A7      : 0x76757473  S2      : 0x00000009  S3      : 0x3fc8f49e
S4      : 0x3fc8acd0  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x6e6d6c6b  T4      : 0x6a696867  T5      : 0x66656463  T6      : 0x62613938
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000007  MTVAL   : 0x00000000
0x40380001: _vector_table at ??:?

MHARTID : 0x00000000


Backtrace:


panic_abort (details=details@entry=0x3fc8f36c "assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && \"block already marked as free\")") at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/panic.c:452
452         *((volatile int *) 0) = 0; // NOLINT(clang-analyzer-core.NullDereference) should be an invalid operation on targets
#0  panic_abort (details=details@entry=0x3fc8f36c "assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && \"block already marked as free\")") at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/panic.c:452
#1  0x403838e8 in esp_system_abort (details=details@entry=0x3fc8f36c "assert failed: tlsf_free tlsf.c:1119 (!block_is_free(block) && \"block already marked as free\")") at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/esp_system/port/esp_system_chip.c:84
#2  0x403890e8 in __assert_func (file=file@entry=0x3c0212f3 "", line=line@entry=1119, func=<optimized out>, func@entry=0x3c021984 <__func__.6> "", expr=expr@entry=0x3c0217ec "") at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/newlib/assert.c:81
#3  0x40387e5e in tlsf_free (tlsf=0x3fc8c574, ptr=ptr@entry=0x3fc8ff20) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/heap/tlsf/tlsf.c:1119
#4  0x40387a8e in multi_heap_free_impl (heap=0x3fc8c560, p=p@entry=0x3fc8ff20) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/heap/multi_heap.c:231
#5  0x40380b98 in heap_caps_free (ptr=ptr@entry=0x3fc8ff20) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/heap/heap_caps.c:388
#6  0x4038910e in free (ptr=ptr@entry=0x3fc8ff20) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/newlib/heap.c:39
#7  0x4200712a in app_main () at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:29
#8  0x4201498a in main_task (args=<error reading variable: value has been optimized out>) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/freertos/app_startup.c:208
#9  0x40385a2c in vPortTaskWrapper (pxCode=<optimized out>, pvParameters=<optimized out>) at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c:202
ELF file SHA256: 1df25094bc6834da

You can see from the logs that there is a block already marked as free warning and an error code location hint at app_main () at /home/zhengzhong/github/esp-idf/rel5.1/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:29. It indicates that a double-free situation occurred at line 29, and the second call to free needs to be removed.