Skip to content
simonlindholm edited this page Dec 7, 2014 · 2 revisions

Librcd uses an arena-based strategy for dealing with memory management. There is a stack of temporary heaps, managed in tandem with the regular call stack. When entering a sub_heap block, a new heap is allocated (on the previous heap) and pushed onto the heap stack. Every memory allocation within that block is then done within that heap. When leaving the sub_heap block, the memory is automatically freed. Sub heaps are cheap - entering and leaving a sub heap takes around 0.1 us on a fast machine, assuming no segmented stack overhead. Memory can be moved between heaps with import, import_list, escape, escape_list¹, and you can switch heaps by entering a switch_heap block. Memory allocation is done either by fstr_alloc or fstr_alloc_buffer, or the convenience macro new. Memory can be freed by calling lwt_free, but it should almost never be necessary - sub heaps are preferred instead.

Heaps should be designed to only allocate roughly a constant amount of memory (not including sub heaps) - if memory usage of a heap is unbounded and grows with time, this indicates a memory leak. For instance, in case of a web server it makes sense to use one sub heap per request. Using a single heap for several requests in a row would leak memory, and using several sub heaps within a single request, while perfectly fine, might complicate matters unnecessarily.

Functions that return newly allocated memory should do so by returning an "owned" data type, such as fstr_mem_t*, or something carrying its own heap. Almost always these are pointer types, to indicate clearer that they represent memory instead of value types. The memory returned is allocated on the parent heap, so it is common to extract data from the owned type in a manner like fss(fstr_upper(str)) without caring to store the allocated memory away anywhere. Yet, it is an important signal, and can be used for things like moving the memory into another heap. The following is a typical pattern:

typedef struct {
    lwt_heap_t* heap;
    fstr_t whatever_data;
    fstr_t allocated_on_the_heap;
} ret_t;

ret_t* do_something() {
    lwt_heap_t* heap = lwt_alloc_heap();
    switch_heap(heap) {
        ret_t* ret = new(ret_t);
        ret->heap = heap;
        ret->whatever_data = fss(fstr_cpy("hello"));
        ret->allocated_on_the_heap = fss(fstr_cpy("librcd"));
        return ret;
    }
}

(In practice one would use the macro fsc which is defined as fss(fstr_cpy(-)), and making copies of static strings is pointless and done only for the sake of example.)

For local, auditable (static) functions, it can also be okay to allocate memory within the parent heap, if the alternative is too complex.

Temporary memory use within a function should generally not leak not outside of that function. This is commonly accomplished by putting the entire function within a sub_heap block, and moving any returned memory out of the function through escape or escape_list. Even if no additional memory is used within a function, this can be a good pattern to follow, because it makes memory ownership easier to understand at a glance.

As long as memory is used in a tree-like fashion, this works out well. Even in more complex cases, memory can almost always be assigned to some logical owner structure, and then that owner can carry a heap which is switched to when needed, and in which memory is managed. In the worst case, this devolves into doing usual C-style memory management on a single, "global" heap. This is prone to memory leaks, but is still viable if everything else fails.

Sometimes the case comes up when you want to perform some logical transaction and keep returned memory if it succeeds, but throw it away otherwise (if it throws an exception). For this, sub_heap_txn can help - it creates a new heap and merges it into the parent heap if the block exits normally. Here is an example pattern:

switch_heap(target_heap) {
    dict(fstr_t)* target_dict = fetch_target_heap();
    sub_heap_txn(heap) switch_heap(heap) { // or switch_txn(heap)
        fstr_t key = fallible_get_key();
        fstr_t value = fallible_get_value();
        dict_replace(target_dict, fstr_t, key, value);
    }
}

There exists a global heap which lives for the whole lifetime of the program (even after the main fiber has died), and can be switched to by using a global_heap block.

When librcd is compiled in debug mode (with -DDEBUG), freed memory is overwritten with a 0xfefefefe pattern, and small allocated memory regions are initialized to 0xa0a0a0a0. This can be useful for tracking invalid memory accesses. -DVM_DEBUG_GUARD_ZONE and -DVM_DEBUG_PAGE_AND_NOREUSE_ALLOCS can also be useful in tracking things down, but they are too slow for general use. See vm.c for more details.

¹ Unfortunately this means that heaps are no longer consecutive memory areas with zero-cost freeing - rather, they are implemented on top of malloc.

Clone this wiki locally