C++ memory reclamation for concurrent data structures

Code

A full implementation of my record manager abstraction (combined allocator and reclaimer) and a few variants of epoch-based reclamation can be found here. (Technically there is also an implementation of hazard pointers, as well as an implementation of object pooling -- see this paper.)

Using the record manager

Suppose you have a data structure, such as a concurrent linked list, that allows search, insert and remove, and you want to be able to allocate and safely reclaim memory for its nodes. You will create a "record_manager" for this data structure (see record_manager.h). This record_manager will define how memory is reclaimed, allocated, and possibly pooled, as well as what type of objects it contains. A record_manager is declared as follows:

#include <record_manager.h>
record_manager<Reclaim, Alloc, Pool, ObjectType1, ObjectType2, ...> myRecManager;

In this declaration, Reclaim is a class type that specifies how objects will be reclaimed (see reclaimer_[...].h), Alloc is a class type that specifies how objects will be allocated (see alloc_[...].h), and Pool is a class type that specifies how (or if) objects will be pooled (see pool_[...].h). ObjectType1, ObjectType2, ... are the object types that the record_manager will operate on.

Suppose our list consists of nodes of type Node which contain int keys. Then reasonable default choices for Reclaim, Alloc and Pool are:

Reclaim = reclaimer_debra<int>
Alloc = alloc_new<int>
Pool = pool_none<int>

The resulting declaration looks like:

record_manager<reclaimer_debra<int>, alloc_new<int>, pool_none<int>, Node> * myRecManager
    = new record_manager<reclaimer_debra<int>, alloc_new<int>, pool_none<int>, Node>(
      maxNumberOfConcurrentThreads);

All record_manager functions take a thread ID (in the range 0 ... maxNumberOfConcurrentThreads - 1) as an argument.

To allocate a Node, invoke allocate:

auto myNode = myRecManager->template allocate<Node>(myThreadID)

At the beginning of each data structure operation, e.g., Search, invoke getGuard:

bool search(int key) {
    auto guard = myRecManager->getGuard(myThreadID);
    // rest of search ...
}

It is safe to access nodes only where guard is in scope. (If you don't want to use a guard variable for automatic RAII scoping, because you want more control over the scope where it is safe to access nodes, you can use the record_manager's startOp and endOp functions, instead of getGuard. Note: The getGuard function works by simply calling startOp, then creating a stack variable whose desctructor calls endOp.)

Whenever you unlink a node from the list, invoke retire:

bool remove(int key) {
   auto guard = myRecManager->getGuard(myThreadID);
   // your removal code
   // ...
   // unlinkedNode is now unlinked from the list
   myRecManager->retire(myThreadID, unlinkedNode);

The retire function will eventually free unlinkedNode when no thread can reach it.

For reclaimer_debra, that's all you have to do. You've allocated and safely reclaimed memory for your list. (This is also enough for reclaimer_debracap, and of course reclaimer_none. However, for reclaimer_hazardptr, you must use the record_manager's protect and unprotect functions, which are described in the paper.)

Running an example data structure in a microbenchmark

An advanced concurrent data structure (a relaxed (a,b)-tree) that allocates and reclaims memory using this approach can be found in ds/brown_ext_abtree_lf.

A microbenchmark that uses this data structures appears under "microbench/." Compile with make -j n, where n is the number of threads on your system.

Run with, e.g., "LD_PRELOAD=../lib/libjemalloc-5.0.1-25.so bin/brown_ext_abtree_lf.ubench_rdebra -i 10 -d 10 -rq 0 -rqsize 100 -k 1000000 -nrq 0 -nwork 36 -t 1000 -nprefill 36". (Parameters: -nwork is the number of threads, -k is the size of the key range, -i and -d are % insertions/deletions, -t is millis to run, -nprefill is the number of threads that will perform prefilling on the data structure. Just ignore -rq, -rqsize and -nrq. You can optionally pin threads to logical processors, in a specific order, by specifying, e.g., "-pin 0-17,72-89,18-35,90-107,36-53,108-125,54-71,126-143".)

Note that we use LD_PRELOAD to replace malloc/free (new/delete) with the scalable jemalloc allocator, since otherwise the default glibc allocator is used, and it is quite slow at high thread counts. I've included the .so file for linux/x64 in the repository (in lib/). (On other systems, you should build jemalloc or use whatever scalable allocator you like.)

In the command above, memory reclamation is done with DEBRA, an efficient EBR algorithm. A few variants of DEBRA, as well as a no-reclamation variant, may also be compiled, depending on the contents of the Makefile (binaries: *.ubench_rdebra,*.ubench_rdebracap, *.ubench_rnone). Several other data structures are likely also compiled.

Adding the record manager to your own project

Easy way: copy the entire "common" folder into your project, and add it and its subfolders to your include path.

Harder: if you want to import only the minimal required code into your project, I think you can just copy common/errors.h and common/recordmgr/* (fixing includes / include paths as necessary to allow the headers to find each other).