Heap Memory Debugging

Overview

ESP-IDF integrates tools for requesting heap information, detecting heap corruption, and tracing memory leaks. These can help track down memory-related bugs.

For general information about the heap memory allocator, see the Heap Memory Allocation page.

Heap Information

To obtain information about the state of the heap:

  • xPortGetFreeHeapSize() is a FreeRTOS function which returns the number of free bytes in the (data memory) heap. This is equivalent to calling heap_caps_get_free_size(MALLOC_CAP_8BIT).

  • heap_caps_get_free_size() can also be used to return the current free memory for different memory capabilities.

  • heap_caps_get_largest_free_block() can be used to return the largest free block in the heap. This is the largest single allocation which is currently possible. Tracking this value and comparing to total free heap allows you to detect heap fragmentation.

  • xPortGetMinimumEverFreeHeapSize() and the related heap_caps_get_minimum_free_size() can be used to track the heap “low water mark” since boot.

  • heap_caps_get_info() returns a multi_heap_info_t structure which contains the information from the above functions, plus some additional heap-specific data (number of allocations, etc.).

  • heap_caps_print_heap_info() prints a summary to stdout of the information returned by heap_caps_get_info().

  • heap_caps_dump() and heap_caps_dump_all() will output detailed information about the structure of each block in the heap. Note that this can be large amount of output.

Heap Corruption Detection

Heap corruption detection allows you to detect various types of heap memory errors:

  • Out of bounds writes & buffer overflow.

  • Writes to freed memory.

  • Reads from freed or uninitialized memory,

Assertions

The heap implementation (multi_heap.c, etc.) includes a lot of assertions which will fail if the heap memory is corrupted. To detect heap corruption most effectively, ensure that assertions are enabled in make menuconfig under Compiler options.

If a heap integrity assertion fails, a line will be printed like CORRUPT HEAP: multi_heap.c:225 detected at 0x3ffbb71c. The memory address which is printed is the address of the heap structure which has corrupt content.

It’s also possible to manually check heap integrity by calling heap_caps_check_integrity_all() or related functions. This function checks all of requested heap memory for integrity, and can be used even if assertions are disabled. If the integrity check prints an error, it will also contain the address(es) of corrupt heap structures.

Finding Heap Corruption

Memory corruption can be one of the hardest classes of bugs to find and fix, as one area of memory can be corrupted from a totally different place. Some tips:

  • A crash with a CORRUPT HEAP: message will usually include a stack trace, but this stack trace is rarely useful. The crash is the symptom of memory corruption when the system realises the heap is corrupt, but usually the corruption happened elsewhere and earlier in time.

  • Increasing the Heap memory debugging Configuration level to “Light impact” or “Comprehensive” can give you a more accurate message with the first corrupt memory address.

  • Adding regular calls to heap_caps_check_integrity_all() or heap_caps_check_integrity_addr() in your code will help you pin down the exact time that the corruption happened. You can move these checks around to “close in on” the section of code that corrupted the heap.

  • Based on the memory address which is being corrupted, you can use JTAG debugging to set a watchpoint on this address and have the CPU halt when it is written to.

  • If you don’t have JTAG, but you do know roughly when the corruption happens, then you can set a watchpoint in software just beforehand via esp_set_watchpoint(). A fatal exception will occur when the watchpoint triggers. For example esp_set_watchpoint(0, (void *)addr, 4, ESP_WATCHPOINT_STORE. Note that watchpoints are per-CPU and are set on the current running CPU only, so if you don’t know which CPU is corrupting memory then you will need to call this function on both CPUs.

  • For buffer overflows, heap tracing in HEAP_TRACE_ALL mode lets you see which callers are allocating which addresses from the heap. See Heap Tracing To Find Heap Corruption for more details. If you can find the function which allocates memory with an address immediately before the address which is corrupted, this will probably be the function which overflows the buffer.

  • Calling heap_caps_dump() or heap_caps_dump_all() can give an indication of what heap blocks are surrounding the corrupted region and may have overflowed/underflowed/etc.

Configuration

Temporarily increasing the heap corruption detection level can give more detailed information about heap corruption errors.

In make menuconfig, under Component config there is a menu Heap memory debugging. The setting CONFIG_HEAP_CORRUPTION_DETECTION can be set to one of three levels:

Basic (no poisoning)

This is the default level. No special heap corruption features are enabled, but provided assertions are enabled (the default configuration) then a heap corruption error will be printed if any of the heap’s internal data structures appear overwritten or corrupted. This usually indicates a buffer overrun or out of bounds write.

If assertions are enabled, an assertion will also trigger if a double-free occurs (the same memory is freed twice).

Calling heap_caps_check_integrity() in Basic mode will check the integrity of all heap structures, and print errors if any appear to be corrupted.

Light Impact

At this level, heap memory is additionally “poisoned” with head and tail “canary bytes” before and after each block which is allocated. If an application writes outside the bounds of allocated buffers, the canary bytes will be corrupted and the integrity check will fail.

The head canary word is 0xABBA1234 (3412BAAB in byte order), and the tail canary word is 0xBAAD5678 (7856ADBA in byte order).

“Basic” heap corruption checks can also detect most out of bounds writes, but this setting is more precise as even a single byte overrun can be detected. With Basic heap checks, the number of overrun bytes before a failure is detected will depend on the properties of the heap.

Enabling “Light Impact” checking increases memory usage, each individual allocation will use 9 to 12 additional bytes of memory (depending on alignment).

Each time free() is called in Light Impact mode, the head and tail canary bytes of the buffer being freed are checked against the expected values.

When heap_caps_check_integrity() is called, all allocated blocks of heap memory have their canary bytes checked against the expected values.

In both cases, the check is that the first 4 bytes of an allocated block (before the buffer returned to the user) should be the word 0xABBA1234. Then the last 4 bytes of the allocated block (after the buffer returned to the user) should be the word 0xBAAD5678.

Different values usually indicate buffer underrun or overrun, respectively.

Comprehensive

This level incorporates the “light impact” detection features plus additional checks for uninitialised-access and use-after-free bugs. In this mode, all freshly allocated memory is filled with the pattern 0xCE, and all freed memory is filled with the pattern 0xFE.

Enabling “Comprehensive” detection has a substantial runtime performance impact (as all memory needs to be set to the allocation patterns each time a malloc/free completes, and the memory also needs to be checked each time.) However it allows easier detection of memory corruption bugs which are much more subtle to find otherwise. It is recommended to only enable this mode when debugging, not in production.

Crashes in Comprehensive Mode

If an application crashes reading/writing an address related to 0xCECECECE in Comprehensive mode, this indicates it has read uninitialized memory. The application should be changed to either use calloc() (which zeroes memory), or initialize the memory before using it. The value 0xCECECECE may also be seen in stack-allocated automatic variables, because in IDF most task stacks are originally allocated from the heap and in C stack memory is uninitialized by default.

If an application crashes and the exception register dump indicates that some addresses or values were 0xFEFEFEFE, this indicates it is reading heap memory after it has been freed (a “use after free bug”.) The application should be changed to not access heap memory after it has been freed.

If a call to malloc() or realloc() causes a crash because it expected to find the pattern 0xFEFEFEFE in free memory and a different pattern was found, then this indicates the app has a use-after-free bug where it is writing to memory which has already been freed.

Manual Heap Checks in Comprehensive Mode

Calls to heap_caps_check_integrity() may print errors relating to 0xFEFEFEFE, 0xABBA1234 or 0xBAAD5678. In each case the checker is expecting to find a given pattern, and will error out if this is not found:

  • For free heap blocks, the checker expects to find all bytes set to 0xFE. Any other values indicate a use-after-free bug where free memory has been incorrectly overwritten.

  • For allocated heap blocks, the behaviour is the same as for Light Impact mode. The canary bytes 0xABBA1234 and 0xBAAD5678 are checked at the head and tail of each allocated buffer, and any variation indicates a buffer overrun/underrun.

Heap Tracing

Heap Tracing allows tracing of code which allocates/frees memory.

Note

Heap tracing “standalone” mode is currently implemented, meaning that tracing does not require any external hardware but uses internal memory to hold trace data. Heap tracing via JTAG trace port is also planned.

Heap tracing can perform two functions:

  • Leak checking: find memory which is allocated and never freed.

  • Heap use analysis: show all functions that are allocating/freeing memory while the trace is running.

How To Diagnose Memory Leaks

If you suspect a memory leak, the first step is to figure out which part of the program is leaking memory. Use the xPortGetFreeHeapSize(), heap_caps_get_free_size(), or related functions to track memory use over the life of the application. Try to narrow the leak down to a single function or sequence of functions where free memory always decreases and never recovers.

Once you’ve identified the code which you think is leaking:

  • Under make menuconfig, navigate to Component settings -> Heap Memory Debugging and set CONFIG_HEAP_TRACING.

  • Call the function heap_trace_init_standalone() early in the program, to register a buffer which can be used to record the memory trace.

  • Call the function heap_trace_start() to begin recording all mallocs/frees in the system. Call this immediately before the piece of code which you suspect is leaking memory.

  • Call the function heap_trace_stop() to stop the trace once the suspect piece of code has finished executing.

  • Call the function heap_trace_dump() to dump the results of the heap trace.

An example:

#include "esp_heap_trace.h"

#define NUM_RECORDS 100
static heap_trace_record_t trace_record[NUM_RECORDS]; // This buffer must be in internal RAM

...

void app_main()
{
    ...
    ESP_ERROR_CHECK( heap_trace_init_standalone(trace_record, NUM_RECORDS) );
    ...
}

void some_function()
{
    ESP_ERROR_CHECK( heap_trace_start(HEAP_TRACE_LEAKS) );

    do_something_you_suspect_is_leaking();

    ESP_ERROR_CHECK( heap_trace_stop() );
    heap_trace_dump();
    ...
}

The output from the heap trace will look something like this:

2 allocations trace (100 entry buffer)
32 bytes (@ 0x3ffaf214) allocated CPU 0 ccount 0x2e9b7384 caller 0x400d276d:0x400d27c1
0x400d276d: leak_some_memory at /path/to/idf/examples/get-started/blink/main/./blink.c:27

0x400d27c1: blink_task at /path/to/idf/examples/get-started/blink/main/./blink.c:52

8 bytes (@ 0x3ffaf804) allocated CPU 0 ccount 0x2e9b79c0 caller 0x400d2776:0x400d27c1
0x400d2776: leak_some_memory at /path/to/idf/examples/get-started/blink/main/./blink.c:29

0x400d27c1: blink_task at /path/to/idf/examples/get-started/blink/main/./blink.c:52

40 bytes 'leaked' in trace (2 allocations)
total allocations 2 total frees 0

(Above example output is using IDF Monitor to automatically decode PC addresses to their source files & line number.)

The first line indicates how many allocation entries are in the buffer, compared to its total size.

In HEAP_TRACE_LEAKS mode, for each traced memory allocation which has not already been freed a line is printed with:

  • XX bytes is number of bytes allocated

  • @ 0x... is the heap address returned from malloc/calloc.

  • CPU x is the CPU (0 or 1) running when the allocation was made.

  • ccount 0x... is the CCOUNT (CPU cycle count) register value when the allocation was mode. Is different for CPU 0 vs CPU 1.

  • caller 0x... gives the call stack of the call to malloc()/free(), as a list of PC addresses. These can be decoded to source files and line numbers, as shown above.

The depth of the call stack recorded for each trace entry can be configured in make menuconfig, under Heap Memory Debugging -> Enable heap tracing -> Heap tracing stack depth. Up to 10 stack frames can be recorded for each allocation (the default is 2). Each additional stack frame increases the memory usage of each heap_trace_record_t record by eight bytes.

Finally, the total number of ‘leaked’ bytes (bytes allocated but not freed while trace was running) is printed, and the total number of allocations this represents.

A warning will be printed if the trace buffer was not large enough to hold all the allocations which happened. If you see this warning, consider either shortening the tracing period or increasing the number of records in the trace buffer.

Heap Tracing To Find Heap Corruption

Heap tracing can also be used to help track down heap corruption. When a region in heap is corrupted, it may be from some other part of the program which allocated memory at a nearby address.

If you have some idea at what time the corruption occurred, enabling heap tracing in HEAP_TRACE_ALL mode allows you to record all of the functions which allocated memory, and the addresses of the allocations.

Using heap tracing in this way is very similar to memory leak detection as described above. For memory which is allocated and not freed, the output is the same. However, records will also be shown for memory which has been freed.

Performance Impact

Enabling heap tracing in menuconfig increases the code size of your program, and has a very small negative impact on performance of heap allocation/free operations even when heap tracing is not running.

When heap tracing is running, heap allocation/free operations are substantially slower than when heap tracing is stopped. Increasing the depth of stack frames recorded for each allocation (see above) will also increase this performance impact.

False-Positive Memory Leaks

Not everything printed by heap_trace_dump() is necessarily a memory leak. Among things which may show up here, but are not memory leaks:

  • Any memory which is allocated after heap_trace_start() but then freed after heap_trace_stop() will appear in the leak dump.

  • Allocations may be made by other tasks in the system. Depending on the timing of these tasks, it’s quite possible this memory is freed after heap_trace_stop() is called.

  • The first time a task uses stdio - for example, when it calls printf() - a lock (RTOS mutex semaphore) is allocated by the libc. This allocation lasts until the task is deleted.

  • Certain uses of printf(), such as printing floating point numbers, will allocate some memory from the heap on demand. These allocations last until the task is deleted.

  • The Bluetooth, WiFi, and TCP/IP libraries will allocate heap memory buffers to handle incoming or outgoing data. These memory buffers are usually short lived, but some may be shown in the heap leak trace if the data was received/transmitted by the lower levels of the network while the leak trace was running.

  • TCP connections will continue to use some memory after they are closed, because of the TIME_WAIT state. After the TIME_WAIT period has completed, this memory will be freed.

One way to differentiate between “real” and “false positive” memory leaks is to call the suspect code multiple times while tracing is running, and look for patterns (multiple matching allocations) in the heap trace output.

API Reference - Heap Tracing

Functions

esp_err_t heap_trace_init_standalone(heap_trace_record_t *record_buffer, size_t num_records)

Initialise heap tracing in standalone mode.

This function must be called before any other heap tracing functions.

Note

Standalone mode is the only mode currently supported.

To disable heap tracing and allow the buffer to be freed, stop tracing and then call heap_trace_init_standalone(NULL, 0);

Return

  • ESP_ERR_NOT_SUPPORTED Project was compiled without heap tracing enabled in menuconfig.

  • ESP_ERR_INVALID_STATE Heap tracing is currently in progress.

  • ESP_OK Heap tracing initialised successfully.

Parameters
  • record_buffer: Provide a buffer to use for heap trace data. Must remain valid any time heap tracing is enabled, meaning it must be allocated from internal memory not in PSRAM.

  • num_records: Size of the heap trace buffer, as number of record structures.

esp_err_t heap_trace_start(heap_trace_mode_t mode)

Start heap tracing. All heap allocations & frees will be traced, until heap_trace_stop() is called.

Note

heap_trace_init_standalone() must be called to provide a valid buffer, before this function is called.

Note

Calling this function while heap tracing is running will reset the heap trace state and continue tracing.

Return

  • ESP_ERR_NOT_SUPPORTED Project was compiled without heap tracing enabled in menuconfig.

  • ESP_ERR_INVALID_STATE A non-zero-length buffer has not been set via heap_trace_init_standalone().

  • ESP_OK Tracing is started.

Parameters
  • mode: Mode for tracing.

    • HEAP_TRACE_ALL means all heap allocations and frees are traced.

    • HEAP_TRACE_LEAKS means only suspected memory leaks are traced. (When memory is freed, the record is removed from the trace buffer.)

esp_err_t heap_trace_stop(void)

Stop heap tracing.

Return

  • ESP_ERR_NOT_SUPPORTED Project was compiled without heap tracing enabled in menuconfig.

  • ESP_ERR_INVALID_STATE Heap tracing was not in progress.

  • ESP_OK Heap tracing stopped..

esp_err_t heap_trace_resume(void)

Resume heap tracing which was previously stopped.

Unlike heap_trace_start(), this function does not clear the buffer of any pre-existing trace records.

The heap trace mode is the same as when heap_trace_start() was last called (or HEAP_TRACE_ALL if heap_trace_start() was never called).

Return

  • ESP_ERR_NOT_SUPPORTED Project was compiled without heap tracing enabled in menuconfig.

  • ESP_ERR_INVALID_STATE Heap tracing was already started.

  • ESP_OK Heap tracing resumed.

size_t heap_trace_get_count(void)

Return number of records in the heap trace buffer.

It is safe to call this function while heap tracing is running.

esp_err_t heap_trace_get(size_t index, heap_trace_record_t *record)

Return a raw record from the heap trace buffer.

Note

It is safe to call this function while heap tracing is running, however in HEAP_TRACE_LEAK mode record indexing may skip entries unless heap tracing is stopped first.

Return

  • ESP_ERR_NOT_SUPPORTED Project was compiled without heap tracing enabled in menuconfig.

  • ESP_ERR_INVALID_STATE Heap tracing was not initialised.

  • ESP_ERR_INVALID_ARG Index is out of bounds for current heap trace record count.

  • ESP_OK Record returned successfully.

Parameters
  • index: Index (zero-based) of the record to return.

  • [out] record: Record where the heap trace record will be copied.

void heap_trace_dump(void)

Dump heap trace record data to stdout.

Note

It is safe to call this function while heap tracing is running, however in HEAP_TRACE_LEAK mode the dump may skip entries unless heap tracing is stopped first.

Structures

struct heap_trace_record_t

Trace record data type. Stores information about an allocated region of memory.

Public Members

uint32_t ccount

CCOUNT of the CPU when the allocation was made. LSB (bit value 1) is the CPU number (0 or 1).

void *address

Address which was allocated.

size_t size

Size of the allocation.

void *alloced_by[CONFIG_HEAP_TRACING_STACK_DEPTH]

Call stack of the caller which allocated the memory.

void *freed_by[CONFIG_HEAP_TRACING_STACK_DEPTH]

Call stack of the caller which freed the memory (all zero if not freed.)

Macros

CONFIG_HEAP_TRACING_STACK_DEPTH

Enumerations

enum heap_trace_mode_t

Values:

HEAP_TRACE_ALL
HEAP_TRACE_LEAKS